SeedQuest - Central information website for the global seed industry

Forum Page

Forum

Forum sources

Topics

Alliances / M & A
Artificial intelligence
Bees & pollinators health
Biodiversity
Bioinformatics
Biological control
Biologicals & inoculants
Biotechnology
Carbon
Cereal crops
Climate resilience
Coexistence
Cover crops
Crop protection
Data science
Digital agriculture
Drought tolerance
Education & careers
Financial
Food & health
Food safety
Food security
Forage crops
Fungicide resistance
Genetic resources
Genome-editing technology
Genomics
Heat tolerance
Herbicide resistance
Indoor agriculture
Insecticide resistance
Intellectual property protection
Legal & regulatory
Machinery & equipment
Market data
Marketing
Microbials / Microbiome
New breeding techniques
New products & tools
New services
New technologies
Non-food agriculture
Oilseed crops
Operations
Organic
Ornamentals
Pasture grasses
People
Pesticide resistance
Phenotyping / Phenomics
Plant & seed nutrition
Plant breeding
Plant health
Plant protein
Precision agriculture
Production
Published in print
Pulse crops
Quality assurance
Regenerative agriculture
Research
Robots / Robotics
Root health
Sales
Seed analysis
Seed colorants & polymers
Seed enhancement
Seed health
Seed science & technology
Seed testing
Seed treatment
Software, agricultural
Soil health
Sustainable ag
Turfgrass
Vegetable crops
Web & IT solutions
Weed management

Species

Data integration or die: the importance of biologist input in efficiently sharing data

Norwich, United Kingdom
October 1, 2015

Vicky Schneider, 361° Division (Training, Public Engagement, Best Practice & e-Science) at The Genome Analysis Centre (TGAC), along with UK and European partners, has reviewed key aspects of standards and formats of biological data to highlight the importance of data integration and management tools for biologists.

Data format structural standards are critical to the intrinsic value of analyses, with regard to retrieval, sharing, validation, reproducibility, and particularly, integration and interpretation.

Integrating data is imperative for the advancement of research; blending results of diverse disciplines is often an essential step in answering meaningful biological questions. To achieve this, standards should be implemented at the source of the data for the sake of efficiency, particularly since the datasets are constantly increasing in size, and it may be almost impossible to achieve unification further downstream.

In order to engage the biologist community, the aim of the scientific paper is to familiarise experimental biologists with definitions and terms used by computational biologists, to foster cooperation towards cohesive data flow pipelines. Four main classes of data format are identified, (tables, FASTA, Genbank and tag-structured), a major step in defining how the multitude might be curated.

Data integration in biological research is centred on standards adoption promising easier conversion between data/file formats. The scale and infrastructure of a given database determine whether it should be stored in a centralised or distributed manner, with a trade-off against the difficulty of updating or querying, respectively. Either way, when the data needs to be (further) integrated (with other data), the computational burden of unifying formats should be eased wherever possible.

Ideally biologists should work with bioinformaticians and computer scientists to get more involved with standardising their data structures, reducing the ongoing issue of database management and programming tools to parse data. This will boost biological research, gaining a more robust structure for data analysis.

Senior Author, Dr Vicky Schneider, Head of the 361⁰ Division at TGAC, said: “Data integration should not just rely on software engineers and computational scientists, but needs to be driven by the actual users whose communities need to define, adopt and use standards, ontologies and annotation best practice. Therefore, it is particularly important for the biological research community to get acquainted with the conceptual basis of data integration, its limitations, challenges and terminology.”

Senior Author, Dr Allegra Via, Assistant Professor in the Biocomputing Group of Sapienza, University of Rome, added: "The importance of biologists in data integration is huge. They are those who produce and analyse data, which need to be shared for a better science. There cannot be data sharing without good practice in data integration."

The paper, titled: “Data Integration in Biological Research: An overview” is published in PubMed. The publication is a collaborative effort between TGAC, Department of Informatics at Ionian University, the ELIXIR Hub and Biocomputing Group, Sapienza University.

TGAC is strategically funded by BBSRC and operates a National Capability to promote the application of genomics and bioinformatics to advance bioscience research and innovation.

More news from: Earlham Institute

Website: http://www.earlham.ac.uk

Published: October 1, 2015

SeedQuest does not necessarily endorse the factual analyses and opinions
presented on this Forum, nor can it verify their validity.

12 books on plant breeding, classic, modern and fun

12 livres sur l'amélioration des plantes : classiques, modernes et amusants

The Triumph of Seeds

How Grains, Nuts, Kernels, Pulses, and Pips Conquered the Plant Kingdom and Shaped Human History

By Thor Hanson

Basic Books

Hybrid
The History and Science of Plant Breeding
Noel Kingsbury
The University of Chicago Press

1997-2009 archive
of the FORUM section.