Readme for NCBI blast ftp site Last updated on February 15, 2004 This file lists the subdirectories and files found on the NCBI BLAST ftp site (ftp://ftp.ncbi.nlm.nih.gov/blast/). It provides the basic information on file content, and on how the files should be used. 1. Introduction NCBI BLAST ftp site provides standalone blast, client server blast, and wwwblast packages for different platforms. It also provides commonly used blast databases in preformatted as well as FASTA format. Some documents on the blast executables and other related subjects are also provided. 2. File list and content A description of the files are listed in the tables below, one table for each directory or subdirectory. 2.1 ftp://ftp.ncbi.nlm.nih.gov/blast/ directory content The blast ftp directory contains several subdirectories each for a specific set of files. +------------------+-------------------------------------------------+ |Name |Content | +------------------+-------------------------------------------------+ blastftp.txt this file db subdirectory with database, in preformatted or FASTA form demo demonstration programs and documents from blast developers documents documents for programs in standalone blast, netblast, and wwwblast programs executables archives for binary distribution of blast programs matrices protein and nucleotide score matrices, only a subset are supported by blast temp temporary directory for miscellaneous files +------------------+-------------------------------------------------+ 2.2 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/db/ subdirectory Databases larger than two gigabytes (2 GB) are formatted in multiple volumes, which are named using the “database.##.tar.gz” convention. All relevant volumes are required. An alias file is provided so that the database can be called using the alias name without the extension (.nal or .pal). For example, to call est database, simply use “–d est” option in the commandline (without the quotes). Certain databases are subsets of a larger parental database. For those databases, mask files, rather than actual databases, are provided. The mask file needs the parent database to function properly. The parent databases should be generated on the same day as the mask file. For example, to use swissprot preformatted database, swissprot.tar.gz, one will need to get the nr.tar.gz with the same date stamp. To use the preformatted blast database file, first inflate the file using gzip (unix, linux), WinZip (window), or StuffIt Expander (Mac), then extract the component files out from the resulting tar file using tar (unix, linux), WinZip (Window), or StuffIt Expander (Mac). The resulting files are ready for BLAST. +---------------------+----------------------------------------------+ |Name |Content | +---------------------+----------------------------------------------+ FASTA subdirectory with databases in FASTA format blastdb.txt content list of the blast database est.00.tar.gz first volume of the est database est.01.tar.gz second volume of the est database est.02.tar.gz third volume of the est database all volumes are needed to reconstitute complete est database est_human.tar.gz human est database, a mask file requires both volumes of est to work est_mouse.tar.gz mouse est database, a maks file needs both volumes of est to work est_others.tar.gz est database without human/mouse entries, a mask file reqires both volumes of est gss.tar.gz genomic survery sequence database htgs.00.tar.gz first volume of the htgs database htgs.01.tar.gz second volume of the htgs database htgs.02.tar.gz all volumes are needed to reconstitute htgs.03.tar.gz complete htgs database human_genomic.tar.gz human chromosome database containing concatenated contigs with adjusted gaps represented by N's nr.tar.gz non-redundant protein database nt.00.tar.gz first volume of the nucleotide nr database nt.01.tar.gz second volume of the nucleotide nr database nt.02.tar.gz all volumes are needed to reconstitute complete nt database other_genomic.tar.gz chromosome database for organisms other than human pataa.tar.gz patent protein database patnt.tar.gz patent nucleotide database pdbaa.tar.gz protein sequence database for pdb entries. It is mask file and requires nr.tar.gz pdbnt.tar.gz nucleotide sequence database for pdb entries. They are not coding sequences for the corresponding protein structure entries! sts.tar.gz sequence tag site database swissprot.tar.gz swissprot sequence database, last major release. It is mask file and requires nr.tar.gz to work properly taxdb.tar.gz taxonomy id database for use with new version of blast database (not fully implemented yet) wgs.00.tar.gz first volume of wgs assembly database wgs.01.tar.gz second volume of the wgs assembly database. wgs.02.tar.gz third volume of the wgs assembly database. wgs.03.tar.gz fourth volume of the wgs assembly database. wgs.04.tar.gz fifth volume of the wgs assembly database. wgs.05.tar.gz sixth volume of the wgs assembly database. all volumes are needed. +--------------------+-----------------------------------------------+ 2.2.1 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA subdirectory he FASTA database files are now stored in this subdirectory, it does contain some additional databases that are not available via the NCBI BLAST pages. Due to file size issues, the full est database is not provided. One needs to get the three subsets and concatenate them together to get the complete est database. These databases will need to be formatted using formatdb program found in the standalone blast executable package. The recommended commandlines to use are: formatdb –i input_db –p F –o T for nucleotide formatdb –i input_db –p T –o T for protein For additional information on formatdb, please see the formatdb.txt document under /blast/documents/ directory. +------------------+--------------------------------------------------+ |Name |Content | +------------------+--------------------------------------------------+ alu.a.gz proteins translated from alu.n alu.n.gz alu repeat sequences drosoph.aa.gz Drosophila protein from genome annotation drosoph.nt.gz Drosophila genome ecoli.aa.gz E.coli K-12 proteins from genome annotation ecoli.nt.gz E.coli K-12 genomic contigs est_human.gz human subset of the est database est_mouse.gz mouse subset of the est database est_others.gz subset of est other than human or mouse entries gss.gz Genomic Survey Sequences (mostly BAC ends) htgs.gz High Throughput Genomic Sequences human_genomic.gz Human chromosomes formed by concatenating genomic contig assemblies (NT_######) and adjusting the gaps with N’s igSeqNt.gz Immunoglobulin nucleotide sequences igSeqProt.gz Immunoglobulin protein sequences mito.aa.gz protein from the annotated mitochondrial genomes mito.nt.gz mitochondrial genomes month.aa.gz protein sequences released or updated in the past 30 days month.est_human.gz human subset of EST released/updated in the past 30 days month.est_mouse.gz mosue subset of EST released/updated in the past 30 days month.est_others.gz EST, wihtout entries from human or mouse, released or updated in the past 30 days month.gss.gz gss entries released/updated in the past 30 days month.htgs.gz htgs entries released/updated in the past 30 days month.nt.gz subset of nt released/updated in the past 30 days nr.gz non-redundant protein sequence database nt.gz nucleotide database from GenBank excluding the batch division htgs, est, gss,sts, pat divisions, and wgs entries. Not non-redundant. other_genomic.gz Chromosome entries other than human pataa.gz Patent protein sequence database patnt.gz Patent nucleotide sequence database pdbaa.gz protein sequences for pdb entries pdbnt.gz nucleotide entries for pdb entries. They are NOT the coding sequence forthe corresponding protein entries sts.gz Sequence Tag Sites database swissprot.gz swissprot database, last major release vector.gz vector sequences from synthetic (syn) division of GenBank wgs.gz Whole Genome Shotgun sequence assembly yeast.aa.gz protein translations from yeast genome annotation yeast.nt.gz yeast genomic sequence +------------------+----------------------------------------------------+ 2.3 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/demo/ directory This directory contains some technical presentations from the BLAST developers along with some demo tools or documentation relevant to BLAST. +------------------------+-----------------------------------------------+ |Name |Content | +------------------------+-----------------------------------------------+ README.blast_demo readme for blast_demo package README.first readme for this directory README.parse_blast_xml readme for parse_blast_xml package blast_demo.tar.gz blast_demo package on blast db, blast object, and reformating blast alignment from blastobj file blast_exercises.doc blast exercise questions answers blast_programming.ppt PowerPoint presentation on BLAST programing blast_talk.ppt PowerPoint presentation (O'Reilly conference) ieee_blast.final.ppt PowerPoint presentation (IEEE conference) ieee_talk.pdf Above IEEE presentation in PDF format parse_blast_xml.tar.gz demo package on parsing xml styled blast output splitd.ppt PowerPoint presentation on NCBI BLAST server’s splitd implementation test_suite.tar.gz test package +------------------------+-----------------------------------------------+ 2.4 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/documents/ directory This directory contains copies of the documentation on different BLAST programs distributed from this ftp site under the /blast/executables/ directory. blast.txt also contains detailed release history. +------------------------+-----------------------------------------------+ |Name |Content | +------------------------+-----------------------------------------------+ blast.txt readme for blastall and blastpgp blastclust.txt readme for blastclust developer subdirectory with additional documentation blast_seqalign.txt describing seqalign function readdb.txt describing readdb function urlapi.txt a short introduction on BLAST URL API which supersedes the blasturl formatdb.txt readme for formatdb program impala.txt readme for impala megablast.txt readme for megablast netblast.txt readme for netblast (blastcl3) rpsblast.txt readme for rpsblast xml subdirectory with .dtd and .mod field description files for blast xml output xml/NCBI_BlastOutput.dtd dtd file for blast xml output xml/NCBI_BlastOutput.mod mod file for blast xml output xml/NCBI_Entity.mod mod file for NCBI xml file xml/README.blxml readme on blast xml output +------------------------+-----------------------------------------------+ 2.5 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/ directory This directory contains several subdirectories each for a specific subsets of executable BLAST programs: /LATEST-BLAST subdirectory contains the standalone blast binaries from the latest major versioned release. /LATEST-NETBLAST sudirectory contains the netblast binaries from the latest major versioned release. /LATEST-WWWBLAST subdirectory contains the wwwblast binaries from the latest major versioned release. /release different releases, with the last one linked to LATEST directories /snapshot subdirectory contains patches or intermediate updates put up in between major releases. For previous releases, go to release subdirectory, where the old major releases are archived back to version 2.0.10. 2.5.1 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST-BLAST, /LATEST-NETBLAST, and /LATEST-WWWBLAST subdirectories All these three subdirectories link to the latest release directory, which contains the standalone BLAST executables package (blast initialed archives), blastcl3 client (netblast initialed archives), and server blast (wwwblast initialed archives). The standalone archive is needed to set up BLAST locally on user's own machine. It also provides the tools necessary to prepare custom databases and retrieve sequences from these prepared databases. Different archives for commonly used platforms are available. The blast client archive contains the blastcl3 program which functions by formulating BLAST search locally first and forwarding the search to NCBI blast server for process. The search results returned by NCBI BLAST server is saved to an user-specified file on local computer disk. The server blast archive contains the web pages with embedded blast search forms similar to that of NCBI that can process the BLAST search request against local set of databases and return the result to a browser window. wwwblast is now in sync with the NCBI toolkit and the two above two packages. +------------------------------------+-------------------------------+ |Name |Content | +------------------------------------+-------------------------------+ MD5SUM.txt blast-2.2.8-alpha-osf1.tar.gz Standalone for COMPAQ/HP alpha machine (OSF 5.1 and above) blast-2.2.8-amd64-linux.tar.gz Standalone for AMD 64-bits PC running linux blast-2.2.8-ia32-freebsd.tar.gz Standalone for intel Pentium PC running freeBSD blast-2.2.8-ia32-linux.tar.gz Standalone for intel Pentium PC running Linux blast-2.2.8-ia32-win32.exe Standalone for intel Pentium PC running Windows blast-2.2.8-ia64-linux.tar.gz Standalone for intel Itanium PC running Linux blast-2.2.8-mips-irix-32-bit.tar.gz Standalone for 32-bits SGI blast-2.2.8-mips-irix.tar.gz Standalone for 64-bits SGI blast-2.2.8-powerpc-macosx.tar.gz Standalone for MacOSX (terminal) blast-2.2.8-sparc-solaris.tar.gz Standalone for Sun Sparc station running Solaris netblast-2.2.8-alpha-osf1.tar.gz netblast for COMPAQ/HP alpha machine (OSF 5.1 and above) netblast-2.2.8-amd64-linux.tar.gz netblast for AMD 64-bits PC running Linux netblast-2.2.8-ia32-freebsd.tar.gz netblast for intel Pentium PC running freeBSD netblast-2.2.8-ia32-linux.tar.gz netblast for intel Pentium PC running Linux netblast-2.2.8-ia32-win32.exe netblast for for intel Pentium PC running Windows netblast-2.2.8-ia64-linux.tar.gz netblast for for intel Itanium PC running Linux netblast-2.2.8-mips-irix.tar.gz netblast for SGI 32-bits system netblast-2.2.8-powerpc-macosx.tar.gz netblast for MacOSX netblast-2.2.8-sparc-solaris.tar.gz netblast for Sun Sparc station running Solaris wwwblast-2.2.8-alpha-osf1.tar.gz wwwblast for COMPAQ/HP alpha machine (OSF 5.1 and above) wwwblast-2.2.8-amd64-linux.tar.gz wwwblast for AMD 64-bits PC running Linux wwwblast-2.2.8-ia32-freebsd.tar.gz wwwblast for Intel Pentium PC running Linux wwwblast-2.2.8-ia32-linux.tar.gz wwwblast for Intel Pentium PC running Linux wwwblast-2.2.8-ia64-linux.tar.gz wwwblast for Intel Itanium PC running Linux wwwblast-2.2.8-mips-irix.tar.gz wwwblast for SGI 32-bits system wwwblast-2.2.8-powerpc-macosx.tar.gz wwwblast for MacOSX wwwblast-2.2.8-sparc-solaris.tar.gz wwwblast for Sun Sparc station running Solaris +------------------------------------+-------------------------------+ 2.5.2 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release subdirectory This directory contains past major releases of BLAST, as far back as version 2.0.10. Each release is in its own subdirectory. 2.5.3 File content for ftp.ncbi.nlm.nih.gov/blast/executables/snapshot subdirectory This subdirectory contains intermediate enhanced or patched archives released after the last major release. They are organized according to the date and only contains the binaries for the affected platforms. 2.5.4 File content for ftp.ncbi.nlm.nih.gov/blast/executables/special subdirectory From time to time, we make binaries for some rare platforms under special circumstances. Those files are archived here. 2.6 File content ftp://ftp.ncbi.nlm.nih.gov/blast/matrices directory This directory contains the scoring matrices, which are files that can be used by BLAST alignment assessment. The file are text files with special format that can be viewed directly by a browser. For valid statistical analysis, blastn uses only identity matrix and blastp only supports a limited subset of the BLOSUM and PAM matrices: BLOSUM 45, 62, 80, plus PAM30 and 70. 2.7 File content of the ftp://ftp.ncbi.nlm.nih.gov/blast/temp subdirectory An left-over subdirectory of miscellaneous files or tools. 3. Techinical Support Additional questions/comments on this ftp site should be directed to NCBI blast-help group at: blast-help@ncbi.nlm.nih.gov Other questions on general NCBI resources should be directed to: info@ncbi.nlm.nih.gov