Cornell, New York
December 14, 2000
A weedy, inedible member of the mustard family, related to
broccoli and cauliflower, has become the first plant to yield the
secrets of its primordial origins. In a computational research effort at
Cornell University, the plant, Arabidopsis thaliana, was shown to contain
genetic evidence of its emergence between 50 million and 200 million years
ago.
The finding, say the Cornell researchers, will be invaluable to those
using Arabidopsis as a genetic model for other plant species, unlocking
genes for important traits in agricultural crops like corn, tomatoes and wheat.
The researchers report on their discovery in the latest edition of
the journal Science (Dec. 15, 2000).
A decade ago, Arabidopsis was widely adopted by plant scientists as
an easily manipulated model for other plants because it is simple to
grow in the laboratory, has a short life cycle and has a small genome -- only
about 140 million base pairs of DNA compared with wheat, which might have
as many as 16 billion pairs. This year, the entire DNA sequence of the plant
was
completed, and for the first time researchers were able to understand
the sequence of the 25,000 genes necessary for an organism to function as
a flowering plant. Using this genome sequence -- which is in the
public domain on the Internet -- the Cornell researchers used computers to
sort
through the plant's DNA and find its genetic roots.
"We can take the entire genome of one plant and look back at it,"
says Steven D. Tanksley, the Liberty Hyde Bailey professor of plant
breeding at Cornell and an author on the paper. "We are going back into genetic
time, and we can see what the ancient genome looked like. If we can
understand what the ancestral gene content in one plant is, then we can use that
to learn the gene content in other plants." Tanksley and the lead
researcher, Todd Vision, a Cornell visiting scientist, explained that for many
plant genomes there is a lot of empty material between the proteins.
Tanksley suggested that understanding a genome is like driving along a
highway. On
the East Coast, you do not have to drive far before you reach another
city, while out west, there are long distances between cities. The point
of the analogy is that scientists can gather more general genetic
information from Arabidopsis in a shorter period of time. Says Tanksley:
"Arabidopsis is the East Coast of DNA sequencing."
The researchers used a computer program called BLAST to classify the
thousands of genes in Arabidopsis into gene families. BLAST (an
acronym for Basic Local Alignment Search Tool) is a sequence similarity
program designed to support analysis of nucleotide and protein databases. It
was developed at the National Center for Biotechnology Information, part
of the National Institutes of Health, in Bethesda, Md. The researchers then
used novel algorithms to find large chunks of the chromosomes that were
duplicated long ago.
In the process of duplication, all the genetic material in a species
doubles, creating what is known as a polyploid. The researchers
inferred that Arabidopsis was an ancient polyploid because it contained
evidence of multiple duplications.
Although duplicated chromosomes diverged from one another and became
scrambled over the eons, the research team was able to find 103 duplicated
chromosome segments that ranged in age from 50 million to 200 million
years. "We figured out where gene family members are located and
used that information to find the ancient duplicated segments," says Vision,
who is a
molecular biologist at the Center for Agricultural Bioinformatics
(CAB) at Cornell. The CAB is supported by the U.S. Department of Agriculture,
Agricultural Research Service, in partnership with the College of
Agriculture and Life Sciences and the Theory Center at Cornell.
With help from the dating estimates obtained by paleobotanists, the
team was able to look at the duplicated gene sequences and deduce when the
genome duplications in Arabidopsis occurred. The team found that a
few large duplication events were responsible for the pattern they saw.
"Our work was entirely computational, but a lot of other researchers'
laboratory work went into it before that," says Vision. He draws an analogy
between finding prehistoric genetic relationships and the development of
language. Many words in Romance languages like Spanish, Italian, French and
Portuguese are derived from Latin. "We can see the roots of the
modern words as being derived from Latin," he says. "In our case, we are
finding the genetic roots of the genes before they duplicated and diverged."
The paper, "The Origins of Genomic Duplications in Arabidopsis," was
authored by Vision, Tanksley and Daniel G. Brown of the Whitehead Institute at
the Massachusetts Institute of Technology. Brown participated in the
research while completing his doctoral degree, which he earned from the
Department of Computer Science at Cornell last spring. The research was funded
by the USDA Agriculture Research Service, and grants from the National
Science Foundation and the Office of Naval Research
Related World Wide Web sites: The following sites provide additional information on this news release. Some might not be part
of the Cornell University community, and Cornell has no control over their
content or availability.
o The Arabidopsis Information Resource:
<http://www.arabidopsis.org/home.html>
o Todd Vision page:
<http://www.igd.cornell.edu/~tvision/ToddVision.html>
o Cornell Thory Center: <http://www.tc.cornell.edu>
o National Center for Biotechnology Information (NCBI) BLAST site (somewhat technical):
<http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html>
o USDA-ARS Center for Agricultural Bioinformatics at Cornell:
<http://genome.cornell.edu/index.html>
How Cornell's computing resources help to "blast" against databases
Running a massive BLAST search on the Arabidopsis genome was easier
for Cornell University researcher Todd Vision than it might have been for
many other genomics researchers, thanks to the Cornell Theory Center
(CTC). The center maintains a special computing resource in Rhodes Hall on the
Cornell campus in conjunction with the U.S. Department of Agriculture's
Center for
Agricultural Bioinformatics (CAB).
The resource, loosely named the 'genomics cluster,' consists of 12
computers, each made up of four 500-Mhz Pentium III processors running the
Windows 2000 operating system. With software developed for CTC, eight
of the machines run as a parallel-processing cluster, effectively a
supercomputer.
The cluster is primarily used for searches using BLAST (an acronym
for Basic Local Alignment Search Tool), a program that searches gene and
protein databases for pattern matches, much the same way a text searcher
will match words and phrases. BLAST servers are available elsewhere
to the worldwide research community through World Wide Web interfaces, but
they are not suitable for running a large batch of queries such as the one
Vision used to track the genetic history of Arabidopsis.
The BLAST server on the Web takes one query and "blasts" it
against the database, Vision explained. "But I needed to run twenty-something
thousand proteins. Imagine sitting there and clicking the mouse that many
times. Doing it on Theory Center computers allowed us to do it in a batch.
Doing it all on local computers allows you more speed, more flexibility and
less hands-on processing."
Vision also assembled on a Theory Center computer a special version
of the Arabidopsis genome database that would tell him the location on the
genome of each protein it found. The processing, he said, took only about
half a day on just one of the four-processor Pentiums.
Other computers in the resource are used for databases. One is a
server for the CAB Web site known as Demeter's Genomes, which makes
extensive databases of plant genomes available to the research community.
The genomics cluster was established several years ago with about
$400,000 in funding from the USDA and is maintained by an annual USDA grant.
David Schneider, Theory Center staff researcher, is principal investigator
of the genomics cluster.
The web version of this release may be found at
http://www.news.cornell.edu/releases/Dec00/AncientGenes.bpf.html
Cornell University News Service
Surge 3
Cornell University
Ithaca, NY 14853
607-255-4206
cunews@cornell.edu
http://www.news.cornell.edu
Contact: Blaine P. Friedlander Jr.
Office: 607-255-3290
E-mail: bpf2@cornell.edu
Cornell news release
N3200 |