BBSRC Portfolio Analyser
Award details
Big Data approaches to host-pathogen mapping: EID2 - an open-access, taxonomically- and spatially-referenced database of pathogens and their hosts
Reference
BB/N02320X/1
Principal Investigator / Supervisor
Professor Matthew Baylis
Co-Investigators /
Co-Supervisors
Dr (Kirsty) McIntyre
,
Dr Maya Wardeh
Institution
University of Liverpool
Department
Institute of Infection and Global Health
Funding type
Research
Value (£)
150,891
Status
Completed
Type
Research Grant
Start date
01/10/2016
End date
31/03/2018
Duration
18 months
Abstract
The ENHanCEd Infectious Diseases database (EID2) is a novel, open-access database of pathogens and their hosts. Key aspects are that: 1 It is populated using automated procedures, such that it is regularly updated as new data become available. 2 It is built on the NCBI taxonomy tree. All pathogens and their hosts or vectors are taxonomically labelled; such that queries can be run at the species level, or at genus/family/order etc. 3 It is populated from two main sources: a. the NCBI sequence database. We have extracted information from the metadata of >20 million nucleotide sequences; b. PubMed. We have extracted information from the titles and abstracts of >6 million papers. 4 EID2 stores spatial locations at the national and sub-national level. It produces maps, based on either nucleotide metadata, PubMed or both. 5 Currently, EID2 is most comprehensive for human and domestic animal pathogens, but the automated procedures mean it also stores information on wildlife, fish and plant pathogens. 6 All information is linked to evidence - i.e., a paper or nucleotide upload 7 The scale is huge and increasing. EID2 currently stores information on >100,000 species of helminth, fungi, protozoa, bacteria and virus (i.e., the major groups that contain pathogens); it has records for >170,000 species of organism, obtained from (and linked to) > 20 million nucleotide sequences; and 7500 species of organism obtained from (and linked to) >6 million papers. EID2 has the potential to become a major resource for health researchers and professionals worldwide - for disease mapping, risk assessment and more. We seek funding to extend EID2 to crop plant pathogens, add a new data stream for notifiable animal diseases, update it as new data become available, improve its comprehensiveness, functionality and speed, monitor the accuracy of uploaded data, allow users to request new functionality and download bespoke outputs, and promote its use to research and other communities
Summary
What are all of the known pathogens of humans, and those of the animals that associate closely with us, and hence might spread to us? What are all of the known pathogens of the plants that we eat? And where in the world are these pathogens found? The general public may be surprised that answers to these questions are hard to obtain. A recent (2013) estimate is that only about one in fifty of human diseases have been comprehensively mapped, and the situation for animal and plant diseases is probably worse. In recent years the University of Liverpool has created an open-access database of human and animal pathogens, called the ENHanCED Infectious Diseases database or EID2. It stores information on pathogens, their hosts, and where both are found in the world, at national and sub-national (i.e. state or region) levels. All data are linked to evidence. The data entered into the database is obtained largely from three publicly available online resources: a taxonomy database (which describes which sort of organism a pathogen or host is - virus, insect, mammal etc.); a nucleotide sequence database (which provides information on the hosts and locations of pathogens); and a publication database called PubMed. Importantly, the data is obtained from these data sources by automated procedures, such that they can be regularly updated for relatively little effort. This is important as, for example, the numbers of nucleotide sequences entered into Genbank, one source of our information, is approaching 10 million per year; and there are over 1 million new papers indexed annually by PubMed (NCBI statistics). We expect EID2 to become a major resource for people involved in health-related research, and other health professionals. The volume of data in EID2 is already large and, over time, as more data become available and are automatically entered into the database, we hope it will become more comprehensive, and the definitive source of pathogen/disease information. EID2 offers numerous functions: identifying the pathogens of hosts, the hosts of pathogens, the known pathogens of a specific country or region, maps of the distribution of pathogens, and more besides. The aim of this proposal is to expand the database to include the pathogens of crop plants, add a new data stream for notifiable animal diseases which will make it more comprehensive and timely, update it regularly as new data become available, increase its functionality and speed, allow users to request changes and download bespoke data outputs, continually assess its accuracy, and promote its use to research and other communities.
Impact Summary
EID2 stores information on the pathogens of humans, animals and plants. Human infectious disease, and its causes, is the concern of a very large number of non-academic organisations, from W.H.O., the NHS, and government ministries and agencies, pharmaceutical companies, to NGOs and charities. The same is true of animal and plant disease. Infectious diseases of livestock and crops are, to varying extents, the business of the government ministries and international organisations concerned with animal health (e.g. O.I.E.) and food security (F.A.O.), pharmaceutical companies, NGOs concerned with development, and charities concerned with natural disasters. Considering government, the greatest relevance is (in the UK) Defra and its agency, APHA, but livestock and plant diseases also touch on the Department of Health (Zoonoses), Department for Business, Innovation and Skills (commercial opportunities, economic costs), and the Ministry of Defence (bioterrorism). This broad relevance is demonstrated to some extent by the range of organisations which have commissioned livestock-centred reports from Baylis in recent years: the UK government's Foresight programme (2005), the Health Protection Agency (2010), the World Bank (2011), the US Department of Defense (2011) and the Smith School of Enterprise and the Environment (2011). EID2 stores equivalent information for wildlife and plant pathogens, indicating its relevance to organisations concerned with conservation and agriculture. We believe government and other organisations can already or will shortly be able to use EID2 for the purposes of horizon scanning (for pathogens near to them, or present in specific trading countries, or most sensitive to climate), for information gathering in order to prepare briefings for government (on specific pathogens during an emergency or potential emergence event), or as a research tool for policy development for disease control. Longer term, it may serve a function in terms of disease surveillance (EID stores both time and space information), although its dependence on publications and sequence uploading means it cannot, and is not intended to be, as responsive as, for example, ProMed or HealthMap. Many members of the public are interested in the pathogens that affect them and where they come from. Our aforementioned paper (ref [7] above), in addition to page views, has generated a very high level of online interest. With an altmetric score of 273, at the time of writing it is described as being in the 99th percentile (ranked 483rd) of > 160,000 tracked papers of a similar age, and ranked 1st for those in Scientific Data. Recently, an output of the paper (Fig 1 in the Case for Support) appeared in a popular science, online forum on Facebook (I F...ing Love Science), with a readership of > 22.5 million people; the feature received > 10,000 Likes in 24 hours and hundreds of Shares, and led to a late surge in the number of page views of the paper itself.
Committee
Research Committee A (Animal disease, health and welfare)
Research Topics
Technology and Methods Development
Research Priority
X – Research Priority information not available
Research Initiative
Tools and Resources Development Fund (TRDF) [2006-2015]
Funding Scheme
X – not Funded via a specific Funding Scheme
I accept the
terms and conditions of use
(opens in new window)
export PDF file
back to list
new search