Chromosomes, which series in size from 50 million to 250 million bases, ought to first be broken down into a great deal shorter pieces. Every short piece is used as a template to produce a set of fragments that are different in extent from each other by a single base that will be recognized in a later step.
The wreckage in a set is divided by gel electrophoresis (separation step). New fluorescent dyes allow separation of all four fragments in a single lane on the gel. An example of an electropherogram using fluorescent dyes.
The last base at the conclusion of each fragment is recognized (base-calling step). This procedure recreates the original sequence of As, Ts, Cs, and Gs for each short piece generate in the first step.
Existing electrophoresis limits are about 500 to 700 bases sequenced per read. Automatic sequencers examine the resulting electropherograms and the production is a four-color chromatogram presentation peaks that represent each of the four DNA bases.
After the bases are "read," computers are used to assemble the short sequences (in blocks of about 500 bases each, called the read length) into long continuous stretches that are analyzed for errors, gene-coding regions, and other characteristics.
To read about all the trouble researchers go through to "finish" this raw sequence from automated sequencers. Over sequence is submitted to major public sequence databases, such as GenBank. Human Genome Project sequence data are thus made freely available to anyone around the world.
At what time sequencing a genome, there are more often than not regions that are hard to sequence Thus, 'completed' genome sequences are hardly ever ever complete, and terms such as 'working draft' or fundamentally complete' have been used to more precisely explain the status of such genome projects. Even when every base pair of a genome sequence has been dogged, there are still likely to be errors there because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes.
It is frequently reported that the goal of sequencing a genome is to get information about the total set of genes in that particular genome series. The quantity of a genome that encodes for genes may be very little. However, it is not forever possible to only sequence the coding regions separately. Also, as scientists appreciate more about the role of this noncoding DNA (often referred to as junk DNA), it will become more significant to have a total genome sequence as a background to understanding the heredity and biology of any given organism.
In a lot of ways genome projects do not imprison themselves to only decisive a DNA sequence of an organism. Such projects may also comprise gene forecast to find out where the genes are in a genome, and what those genes do. There might also be related projects to sequence ESTs or mRNAs to help find out where the genes really are.