Vision for FISH-BOL
Fishes are the most diverse group of vertebrates; there may be as many as 30,000 different species. Because of their high diversity and profound changes in appearance during development, fish identification is no easy task. DNA barcodes as a tool for species identification and discovery has now been widely documented, and such the Fish Barcode of Life (FISH-BOL) initiative seeks to simplify fish identification through the use of these barcodes. The FISH-BOL initiative represents the first effort to assemble a global sequence library for such a diverse group of organisms.
Given the estimated $200 billion USD annual value of fisheries worldwide, FISH-BOL will address socially relevant questions concerning market substitution and quota management of commercial fisheries. For the discipline of ichthyology, FISH-BOL will provide a powerful tool for enhanced understanding of the natural history and ecological interactions of various fish species. The specimens collected and data generated from FISH-BOL will also contribute to an ongoing synthesis concerning the evolutionary history of the most diverse group of vertebrates on Earth. Finally, because the entire edifice of DNA barcoding collapses without accurate taxonomic identifications of reference specimens, the successful execution of FISH-BOL will serve as a powerful demonstration of the immense value of collections, museums and taxonomists to both science and society
TopBackground
Historical methods of identifying, naming and classifying fishes are largely based on visible morphology. Modern taxonomic work includes analysis of a host of other traits, including internal anatomy, physiology, behavior, genes, isozymes and geography; yet morphological traits remain the cornerstone of existing taxonomic treatments. However, there are limitations to relying primarily on morphology when attempting to identify fishes during various stages of their development not considered in original treatments, or when examining fragmentary or processed remains. Even when an intact adult specimen is the subject of identification, the morphological characters and other traits used to discern species are often so subtle and complex that each taxonomist can critically identify only a segment of the global fish fauna.
Multiple taxonomic experts are ordinarily required to identify specimens from even a single biotic survey. Assembling teams of appropriate experts, and/or distributing specimens to them for identification, are both time consuming and expensive tasks. Moreover, accessing existing literature and assessing the validity and priority of various taxon names can be a challenge even for the expert taxonomist. For the non-specialist faced with an assemblage of suboptimal specimens that require species identifications in real time, no method currently exists to bring the sum total of taxonomic knowledge to bear on the problem. This fact is a major impediment to the assessment, conservation and management of global fish biodiversity.
Technological innovation is being harnessed to address this challenge. Large-scale literature digitization projects are enhancing access to existing taxon treatments needed by the global community of taxonomic information consumers. Web-based databases that compile expert-vetted lists of valid taxonomic names and their synonymies, combined with online keys and high-resolution digital images, are further helping to summarize existing knowledge. However, these developments do not address identifications involving larval, juvenile, cryptic or fragmentary specimens.
One of the major benefits of DNA-based identifications is their fast, reliable and accurate characterization across all life stages and species. Early on, the use of DNA sequencing to survey diversity led to the recognition that libraries of reference sequences could be used for species identification in cases of morphological ambiguity, such as with larval stages (Olson et al., 1991). DNA, the basic code for all life forms, can be the substance that unifies biological collections of all sorts. In this respect, access to DNA sequences derived from expert-identified voucher specimens can be used to better characterize and broadly identify species. The ensuing catalog of unique genetic sequences or "DNA barcodes" can conceptually unite diverse assemblages of specimens, collections and associated species information under a common registry of sequence accessions. This will enhance online access to information about species and enable a broadly applicable reference database that is essential for performing DNA-based identifications on samples of unknown identity.
TopEnabling Tools
For nearly two decades it has been recognized that rapidly evolving mitochondrial genes, punctuated with highly conserved regions can be recovered via PCR and that the sequences of these regions allow broad phylogenetic application across the animal kingdom (Kocher et al 1989). However, the generation of a DNA-based identification system places new demands. To be cost effective using existing technology, it is imperative to establish an analysis focus on an easily recovered and standardized region of the genome that provides good taxonomic resolution within a single sequence read. The availability of broad range primers for the amplification of the 5′ region of cytochrome c oxidase subunit I (COI) from diverse phyla established this gene sequence as a particularly promising tool for species identification (Folmer et al., 1994). Hebert et al., (2003a, 2003b) have recently demonstrated that this gene region is highly appropriate for discriminating between closely related species across diverse phyla in the animal kingdom, establishing the 5′ end of COI as the "DNA barcode" locus for broadly identifying animals, including fish (Ward et al., 2005).
Fish comprise nearly half of all vertebrates, yet they are still a manageable group for demonstrating the utility of DNA barcoding, with approximately 20,000 marine and 15,000 freshwater species (FishBase). The real challenge is to establish an organizational infrastructure for the task and to develop clear sampling protocols. From an organizational standpoint, the existing species lists associated with nineteen marine and seven inland FAO statistical areas provide an appropriate starting point for directing regional teams with a goal of sampling five specimens from each species across each area. For certain species exhibiting broad geographic distributions perhaps as many as 25 specimens would be sequenced under this scenario. Given the rudimentary knowledge of existing species distributions combined with the nineteen marine FAO areas, an estimated 500,000 specimens will be needed for comprehensive barcoding of all fish species.
There are many species in existing collections, although specimens that have been fixed in formalin are currently difficult to barcode. Thus many new specimens will need to be collected and archived. The barcode reference database must be populated using voucher specimens identified by experts and backed with archival DNA extractions. Delivery of tissues for DNA extraction is currently the rate-limiting factor for the FISH-BOL program, as high throughput sequencing systems are now in place for utilization by the network. DNA banks must be created to enable the deposition of genetic material for each sequence obtained by the initiative.
Assembling the sequence information into a comprehensive DNA barcode library requires the development of a Laboratory Information Management System (LIMS) capable of providing an audit trail for each barcode generated. This piece of software, which is under development at the University of Guelph, will extend the capabilities of the current Management and Analysis System (MAS), which relates a given barcode record to both a voucher specimen and to a broader set of sequences. The existing Barcode of Life Database (BoLD) serves this function, which among other options generates Neighbor-Joining dendrograms of species′ barcodes in PDF format. The system can also diagram specimen collection localities on a distribution map with an impressive resolution of 1 km/pixel and further facilitates morphological comparison of voucher specimens when appropriate digital images (e.g. eVouchers, sensu Monk and Baker, 2001) are input.
An ongoing Japanese collaboration of the Fish Mitochondrial Research Group has generated whole mitochondrial genome sequences for an impressive number of fishes. The aim of this group is to develop a complete phylogeny of the fishes. The MitoFish database compiles both full and partial mitochondrial sequences of fishes and includes full sequences for 250 species that will soon expand to about 750 species. This data set will be a useful reference for primer development when recalcitrant species are encountered in the FISH-BOL analyses. MitoFish also has numerous links to other relevant biological and genetic databases.
FishBase, a ‘global public good′ developed as a decision support system for the conservation and management of aquatic biodiversity and ecosystems, will help anchor FISH-BOL in a peer-reviewed taxonomic framework of accepted species names. The need is apparent from the fact that there are over 200,000 common names for fishes distributed across 264 languages. FishBase currently recognizes approximately 28,000 valid species names and includes over 80,000 synonyms. FishBase maintains relevant literature citations and includes an identification tool based on morphology, complete with digital images of representative specimens. Clearly, a tight integration of FishBase and FISH-BOL will be critical.
The Integrated Taxonomic Information System (ITIS) represents another initiative to create an easily accessible database with reliable information on species. This program involves a memorandum of understanding among several US federal agencies and in 2001, ITIS joined forces with the UK-based Species 2000 to develop the Catalog of Life, which now boasts a nearly complete inventory of fishes. Standard reports include classification, geography and links to other databases including publications in BioOne and genomic data in GenBank. This taxonomic source for biodiversity information is currently available in four languages on the web. ITIS will be used as the vetted source for valid species names among fishes as part of the FISH-BOL initiative.
The National Institute for Biotechnology Information (NCBI) maintains GenBank and the NCBI Taxonomy Browser databases, which have offered their support for the aims of FISH-BOL. In fact, GenBank has fostered broad support for barcoding among other members of the genomics collaborative that include the DNA Data Bank of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL) database. The collaborative has agreed to publicly archive DNA sequences from the FISH-BOL project. They have also expanded the fields for core specimen annotation in their database architecture to more effectively serve barcoding. This is primarily related to information pertaining to the voucher specimen from which sequences are derived. GenBank and the collaborative have agreed to annotate sequences with the keyword "BARCODE" when they meet the appropriate guidelines which include: a valid species name, at least 500 bp of double stranded sequence (with fewer than 1% ambiguous base-calls) derived from the 5′ end of the COI gene, reference to a structured record for the voucher specimen (see below) from which the sequence was derived, and can also include information on coordinates for the collection locality, collection date, collector, and person who performed the identification. In addition, Barcode entries include reference to the PCR primers used to generate the sequence and can link to the raw data or ‘trace files’ of the sequences themselves, when, as strongly recommended, the traces are deposited in the NCBI Trace Archive. Genbank also make links from specific sequences to specialty databases containing specimen data, literature, and taxonomic databases, including FishBase and ITIS.
The existence of standard symbolic codes for institutional resource collections in Ichthyology (Leviton et al. 1985; http://www.asih.org/codons.pdf), paired with the combined taxonomic treatments of FishBase and ITIS, provide an excellent organizational framework for conducting the FISH-BOL campaign. The above ASIH list continues to be updated by Bill Eschmeyer . GenBank will use this information as a source for developing a structured reference to voucher specimens held in existing reference collections, and will also vet barcode submissions against the taxonomic databases to confirm the validity of names associated with submitted barcode sequences.
TopRegional Perspectives
Fisheries experts working in various regions of the world presented information concerning ongoing programs in their region. Collectively, they presented a global view of fish biodiversity, suggesting that the initial representation of researchers in FISH-BOL is of a sufficient critical mass to commence the project’s stated objective of barcoding all fish.
North American fishes have the benefit of being the best-known, with several recent compilations available to guide the FISH-BOL effort. Scripps is compiling a DNA Bank of California fishes. They have also had recent success in developing a technique to obtain short sequences from formalin preserved specimens, which is proving useful for linking barcode OTUs with named specimens held in reference collections.
Barcoding specimens from Central America, particularly the Isthmus of Panama, will provide an excellent opportunity to compare patterns of morphologic and genetic evolution. Some taxa occurring on either side of the isthmus are considered to be conspecific while other similar species are not. There must be some equivalency in making such taxonomic designations and divergence measures derived from molecular data can help resolve existing discrepancies.
Taxonomic work on South American species is ongoing and the University of Concepcion will establish a regional genetic resource center that could help serve DNA barcoding. Funds are available for defining existing stocks and to support training visits. Capacity building is a major priority for this region and interested researchers should consider becoming involved with the initial phase of FISH-BOL for this region.
European waters have seen a regional decline in species diversity, likely the result of commercial exploitation. A number of relevant EU consortia are well positioned to aid FISH-BOL, including the Fishtrace project and Fish and Chips which is an array-based approach to species identification. The role of barcoding to management efforts involving local EU fisheries could be exemplified through studies of larval dispersal.
The Oceania region provides an opportunity for generating barcodes from over 6,100 species. Planned work in this region includes the BioCode project, with efforts to barcode the flora and fauna of Moorea and its surrounding waters. Such an initiative will demonstrate many exciting applications of barcoding, including projects aimed at increasing our understanding of community assemblies and food webs.
New Zealand and Antarctica offer special opportunities for barcoding, as this region contains both cryptic and cosmopolitan species. Barcoding fishes of this region will enable efficient detection of catch substitutions, where low value species are substituted for high value species in the market and will also extend to the detection of quota substitutions. These are universal benefits of fish barcoding. For example, southern bluefin tuna are regulated under a quota in New Zealand waters while northern bluefin are not. A single bluefin tuna can be worth as much as $50,000 USD, which provides a strong commercial incentive to mislabel southern bluefin catches. Antarctic waters represent about 10% of the world’s ocean, yet there are only about 300 species known from this area suggesting these waters might be a lower priority for FISH-BOL given the expense of collecting in them. However, many of these species will likely be collected via a new CoML project focussed in Antarctic waters.
Australian waters range from tropical and temperate to subarctic. While the region contains a large water mass, it is of low productivity. About 4,500 species, or 25% of all marine species, occur in Australian waters, including many endemics. CSIRO has already obtained samples from some 550 species, including 450 that are commercially harvested. An additional 200 species of freshwater fishes must also be considered. In Australia, the recently established National Oceans Office has been charged with developing a plan for the sustainable use of fisheries, which should drive a local interest in barcoding to aid management decisions.
The fishes of Asian waters have been the target of intensive genetic work, largely led by the Fish Mitochondrial Research Group in Japan. They have as one of their aims the collecting and archiving of about 80% of the fishes in Japan (about 4000 species), with vouchers located primarily in the Natural Science Museum, Tokyo. While their work will focus on characterization of ND4-5, COI sequences could be collected to aid the FISH-BOL campaign.
Russian waters include approximately 568 marine species and about 400 freshwater species. Russia has taxonomic experts for 22 families that have agreed to help with the problem of specimen collection and identification. They are interested in participating and estimate that collection costs would average about $10 USD per specimen, although projected costs to sequence DNA locally are higher than that.
Indian waters contain approximately 1,500 species. India relies heavily on fisheries, harvesting some 6 million metric tons annually from over 400 species. Several institutions in India are poised to support FISH-BOL and identifiable sources of funding exist for this sampling program. The Zoological Survey of India can help with taxonomic identifications and has plans for a national fish museum that could serve as a regional archive for voucher specimens. The possibility also exists to establish an exchange program for training other Asian colleagues.
The fishes of inland waters in Africa and Madagascar are being surveyed by teams from the American Museum of Natural History and the Royal Belgian Academy, while the South African Institute of Marine Biodiversity has played an early role in archiving marine fishes from this region that have been used in barcode pilot studies. The ongoing surveys include sequencing of COI for selected specimens and also includes the collection of digital images for voucher specimens that will son be publicly available.
TopAdministrative Structure
A potential administrative structure for the FISH-BOL campaign was discussed at the workshop. It was decided that the primary work would be led by ten Working Groups that would take responsibility for overseeing collections, identifications and barcoding of the fish faunas in their region.
These regional Working Groups (WGs) included:
- Africa
- Australia
- Europe/Russia
- Indian subcontinent/Central Asia
- Northeast Asia
- Southeast Asia
- Oceania/Antarctic
- South America
- Meso America
- North America
Each WG will include both fresh water and marine partitions. A number of individuals at the workshop expressed a willingness to participate in the formation of the WGs and the FISH-BOL co-organizers will soon seek individuals to act as an interim chair until each region can call a meeting of its members. Chair announcements will be posted to the campaign website as they are established.
The WGs will each assemble a team of researchers, nominate a leader, review the list of species generated for their area using FAO data (with FishBase assistance), keep records of barcoding involvement (collections, vouchers, sequencing), minimize duplication, and seek funding.
The global FISH-BOL campaign will be overseen by a Scientific Committee with 14 members:
- Co-Chairs (Paul Hebert and Bob Ward)
- Campaign Coordinator (Robert Hanner)
- Taxonomic Coordinator (TBD)
- 10 WG Chairs (TBD)
Among other duties, the Scientific Committee will synthesize WG reports and generate a summary report on overall progress. It will provide advice to FISH-BOL members, organize the next global meeting, seek funding to support FISH-BOL administration, provide informal linkage/communication with OBIS and CoML and other organizations with a stake in the FISH-BOL campaign.
TopReferences
Folmer et al. 1994. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol. Mar. Biol. Biotech. 3:294-299.
Hebert et al. 2003a. Biological identifications through DNA barcodes. Proc. R. Soc. Lond. B Biol. Sci. 270:313–321.
Hebert et al. 2003b. Barcoding animal life: Cytochrome c oxidase subunit 1 divergences among closely related species. Proc. R. Soc. Lond. B Biol. Sci. 270: S596–S599.
Leviton, et al 1985. Standards in ichthyology and herpetology: Part I. Standard symbolic codes for institutional resource collections in herpetology and ichthyology . Copeia 1985:802-832.
Olson et al. 1991. Whose larvae? Nature 351:357-358.
Ward, R.D., Zemlak, T.S., Innes, B.H., Last, P.R., Hebert, P.D.N. 2005. Barcoding Australia’s fish species. Phil. Trans. R Soc. B 360: 1847-1857. For the rest of this issue please follow this link.

