PNGG Home arrow N. tabacum Data NCSU
PNGG Home
Classes
M. hapla Resources
BLAST
Mega BLAST
M. hapla Assembly (Contig Fastas & Super-Contig information)
Browse Assembly (GBrowse)
Genome Paper (PNAS 2008 105:7)
NCBI Genome Entry
Proteome @ Superfamily
EST Data (nematode.net)
N. tabacum Data
N. tabacum Data
N. tabacum Data

 

The goal of the Tobacco Genome Initiavie was to tag at least 90% of the genes in Nicotiana tabacum.  The project has been completed successfully.

 The TGI took more than one approach to complete its goals, but the main bulk of data is from the MethylFiltered genomic sequencing that was done.  Additionally, there are some sheared BACs available.  Other data, such as ESTs, can be downloaded from NCBI.

 Assemblies of the N. tabacum data are available for download. Files have been broken up to make downloads more manageable in terms of size as well as to allow you to download only subsets of the files of interest to individual researchers.

All files are tarr'ed and gzipped from a Sun Solaris system. (Right click a link and "save target as" to download)

 

Please keep in mind that this project ended in 2008, and therefore support for individual analyses is no longer available.


OSB_BACs.tar.gz (81MB)

This file contains the data for the sheared BACs sequenced in-house using Sanger sequencing. Included in this directory are the contigs from the assembly, the reads (including quality files) for each of the assembled BACs and an example of the reads_config.xml file that was used with Arachne for the assembly. This xml file contains information such as insert sizes. All of these BACs were assembled using Arachne based on the parameterd contained within this file.


Tobacco_MF_fasta.tar.gz (426 MB)

This tarball contains three files.

The first, MethylFiltered.Decon.masked.fasta, are the original reads used for the assembly.

They have been "Decontaminated", meaning that there should be no mitochondrial or chloroplast sequences and no vector or other contaminants in these sequences, and they have been repeat masked using a plant-specific library.

The other two files are the contigs and singletons from the cap3 assembly for this data. Cap3 was run with mostly default parameters with the exception of -p 90, requiring at least 90% identity in the overlapping regions defining a join.


Tobacco_MF_qual.tar.gz (545 MB)

Tobacco_MF_qual.tar.gz contains two files, which are the quality files for the reads that went into the assembly and the quality files for the resulting contigs.


MethylFiltered.Decon.masked.fasta.cap.ace.gz (174 MB)

MethylFiltered.Decon.masked.fasta.cap.ace.gz is the ace file for the assembly, which should provide information for you about placement of the reads into the contigs as well as other information about the assembly you might be interested in.


Tobacco_MF_other.tar.gz (5 MB)

This is a directory that contains a quick sketch of the assembly protocol used, the number of bases (or characters) in the contigs file and the singlets file, and a file called MF_assm.Contig-to-raw.map, which is simply a list of which reads went into each Contig. (We use this for the common question asked by researchers - "is this read part of a contig?")