This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Skip to end of metadata
Go to start of metadata
Attendees:

Jode Edwards, Assibi Mahama, Carson Andorf, David Ertl, Carolyn Lawrence, Darwin Campbell, Pat Schnable (virtual), Jack Garnier (virtual), Ed Buckler and Sinta Srinivasan (virtually at ~10)

Background on attendees:
  • Jode Edwards-USDA/ARS maize breeder
  • Assibi Mahama-program assistance for eLearning for masters in plant breeding for Africa-has Gates foundation funding and works with IBP
    • IBP will be discussed, Walter Souza works on IBP outreach. Trying to get more crosstalk with two groups.
Overview:
  • Group is translating genomic information from reference and resequencing data.
  • They have some phenotyping data available, this is determined to be rate limiting. Trying to develop ways of analyzing phenotypic data with environmental and genome sequence data.
  • The idea is to apply genomic information in a way that makes it useful to growers.
  • G2F is a coordinated project. Phenotypic data expensive to collect- plans to interpret needed.
  • Trying to understand genetic control by environment
  • Public-private project (Rodney Williamson--also engaging private sector). Need to develop novel phenotyping strategies and analysis methods.
  • Data should be shared with community to enable folks to re-analyze and combine in new ways. Important training sets.
  • Interaction between genotype and environment complicate phenotype-breeders do experiments in controlled conditions previously. Group is taking a different approach. Want to embrace natural world environments and genetic control.
  • Want to use big data develop models to predict phenotypes based on GxE
  • They are generating large datasets and goal is to open this up to community to see improvement in crops.
  • Natalia deLeon is leading group generating data.
    • Locations across US with common set of hybrids and region specific hybrids, additional sites with inbreds.
    • Same model for collection.
    • SOPs for collecting generated.
    • At this stage, have a modest size dataset that is consistent to test varied approaches.
    • They are beginning to develop ways to analyze data.
Data collection
  • Using RTK-GPS--new approaches for collecting phenotypic data.
  • 3 ways of collecting: greenhouses, sensors, conveyor belts
    • Put sensors on robots (valid in field)
    • Trying to reduce the cost of sensors. They have stationary sensors in fields looking at plants (James Schnable built device that is working well and leverages a cell phone camera).
  • Have generated stop action movie - available take photos at 30 minute intervals during intentional drought, can watch plant begin to experience drought.
    • Different genotypes wilt differently. Can go in and identify allelic variation and process of drought see patterns not looking at endpoints.
  • Not systematically collecting data in G2F. Very diverse data (some large) need to extract quantified data from these. No algorithms in place yet.
  • Want to enable researchers to develop hypotheses, then do controlled growth experiments based on data generated.
Plans
  • Next summer, large scale study planned (2015)
  • Smaller projects funded for folks to develop new tools-also trying to get funding to collect more data.
Data descriptions
  • 40 experiments (between hybrid and inbred locations) originally planned. Still have most (some weather and personnel issues0.
  • Experimental design
    • Right now, centrally managed enterprise. SOPS, etc determined.
    • 2 methods of managing data (hybrid vs. inbreds).
    • Seed sent to investigators, seed planted in randomized designs.
    • Weather stations in put in place (from time of planting most of the time).
  • Over summer, more intense phenotyping in inbred trial (because it was smaller).
  • One value per plot from each of hybrid and inbred plots.
    • In inbred study, some folks did a bit more intensive phenotyping.
    • Collected ears and shipped to UWI to get imagery for certain number of ears from every plot.
    • At some locations, depending on area and interest of investigator, measured more in addition to common set.
    • Some things are standardized, some things are not.
  • Hybrid experiments were 500 plots in each of the 21 environments.
  • Inbreds were 32 inbreds x 2 replications.
  • To date, have data from every hybrid location. Collated into a database. Have weather data from 23 weather stations.
  • Datasets on inbred studies still being put together.
  • On hybrids, have GBS data on both inbred and hybrid data-vast majority are collated and ready to go
    • Have genotype by sequencing, weather station, inbred dataset and hybrid dataset from all sites
  • Central inventory of seed sources, sent to folks generating GBS data.
    • Generated genotypes to go along with inventory, that inventory is linked to hybrid study.
    • Not direct link to inbreds.
    • Weather stations linked through a location ID. Weather data-summarized rainfall and temp data on environments they have. Flowering dates converted to heat units.
Community and procedure
  • Dispersed community. Have centralized design and database for inbred and hybrid, sent out standard file formats-sops for how to measure phenotypes, experimental design built into files sent out.
    • Data returned adhered to desired template quite well.
  • Started with database at begining of year with seed source inventory, pedigrees, experimental design.
  • Had a fairly complete template.
  • There was internal consistency to start with.
Group needs
  • Would like to get centralized location for emailing instructions out.
    • Maybe wiki? Project page, wiki page-what they will do is have vignettes for “how do you do this….” Can coordinate better.
  • Mailing list? do they need this?
    • Nicole to work with Darwin and Jack to get one established through iPlant for group (this way, we are a generic entity and it does not appear that one university holds the keys to the project).
  • Versioning of templates - subscribe to update feed for wiki for changes to pages.
    • Would be evolution of SOPS moving forward. New traits, new tools, would be a growth
    • New data types on annual basis.
    • Will need new SOPs for the datatypes on annual basis. Annual seasons may limit evolution through year. Stick to SOP for season. From season to season may evolve.
      • If you change too much, will need to call it a different trait.
    • SOP is mostly how to measure things mostly.
  • Metadata collection includes “how to plant, how to manage, what was environment”.
    • Return rate not excellent.
  • No notes on activities throughout season.
    • That metadata within projects is needed, but not adapted to technology very well.
    • What can they do to help individual projects with electronic lab notebooks, tablets, etc.
  • No specific SOPs for field management, but do want to know what happens.
  • MAJOR NEED - ELECTRONIC FIELDBOOK- IBP?
Current status from Carolyn’s perspective
  • Need CG Backoffice (development underway by Buckler's group)
  • Folks involved need to be Maize CGC, iPlant, MaizeGDB, IBP/BMS, Private industry
  • Goal of developing explicit use cases- get solutions to interoperate
  • Core traits collected: plant, ear, agronomic and productivity, weather station data. Have some drone information. Infrared images of field.
  • How to transfer traditional breeding data from individual system (PRISM) into maybe BMS? Maybe DivSeek
    • AN ASIDE: TRADITIONAL BREEDING DATA
      • plant height, leaf number, internode length- measurements
      • emerging datatypes-images of fields, etc.
      • More “traditional” phenotypes you would measure, they know how to do it, but not standardized. Looking to be able to integrate a few ways
      • Emerging datatypes, trying to constrain into taking in a consistent way, may be a problem for the science.
    • Traditional: Jode's perspective
      • Capture managing seed, nurseries, yield trials. well established datatypes that go along with all of that.
  • Get long term knowledge bases engaged, train folks in GxE on how to use iPlant resources.
  • Will have high throughput phenotyping. Genotype needs environments codified, also want to compute on phenotypes across domains of life.
  • Want to create example datasets, engage with folks outside of domain, work with computer scientists, and able to train and adapt folks to solutions
  • Want to take traditional breeding data and marry to genotyping data, bring in environment data, want to connect to places like MaizeGDB where there is deep knowledge for genes for what changes can be made.
  • Also add georeferencing and GIS systems.
  • Can develop shape files for each of these studies, know where plots map into field.
  • Connect georeferencing to GIS systems to get interpolated weather data. Maybe connect via STINGERS GROUP??
    • Line up their interpolated weather data with their weather stations as QC measure.
CG back office
  • Tool that breeders will interact with directly would be BMS (breeding managemnet system).
    • Field collection data, tracking germplasm, what is the material that should be advanced for next generation. All goes into IBP/BMS.
  • Locally, been shifting their lab system over to BMS (Sinta leading this). Bunch of parts that don’t work for them.
    • Have gotten support from USDA to connect up new modules on BMS side.
    • Will be adding things to connect fieldbook app and HT remote sensing data will be put through that.
    • Over next year, maize diversity project will be following it and transitioning to BMS as field collection system.
  • CG back office-vast amounts of genomic/marker/phentoypic data, little computing going on to connect those things (not real time).
    • Name will be changed. Official contract signed a week ago.
    • Large set of programmers and the 3 CGs working with molecular breeders to build compute infrastructure for data, breeding value estimates will be returned to breeders in real time to know what varieties to advance.
    • Will take in low density markers, combine with high density of genotypes, hapmap and genome sequence data.
    • Will impute germplasm that comes in (try to go all the way to full genome sequence).
    • Will run estimates and predictions, will do GWAS and will provide simple trait characterization- will give breeding values based on selection indicies.
    • Will give list similar to 23andme with confidence scores.
    • Agile framework. 3 month deliverables, but also cycling and revisiting topics every year.
    • Each of major sections will be updated. Not a research initiative. Hiring director to run program. development project!!! ** Taking existing algorithms and approaches and implementing in open source software.
  • BMS has some components that are difficult.
    • BMS going to a “pay model”? Components they care about are open source and won’t affect them.
  • BMS-have to launch from own computer. That is backward, should be able to launch it at iPlant and use it there.
  • Unlikely to use experimental design (windows based) , Stat package (windows based). Can replace with open source or things are already managed (ex: exp. design, Jode handles this).
    • Hope is that as CG backoffice- can be engaged and involved in deploying - could go into iPlant and use these.
  • Database, field database design on BMS’s side
  • There will be useful code out of CGBO in about a year, after that point, should be consistent.
    • Database schemas will be discussed.
    • First module will be after input/output into database-imputation module, then a GS module.
  • Develop requirements, converge on standards, sharing that with G2F.
    • Standards for phenotypic collection done by CG. They started there, just got more explicit.
    • They are trying to hide the making of a genomic selection prediction.
    • Need to choose what training set should look like and tell them what they want an estimate for and selection index. Everything else should be hidden.
David Ertl and G2F project background
  • Iowa Corn Growers- works in small research group. Conducts research on production (agronomic). Most work in genetics. Have biotech program and phenotype initiative.
  • National Corn, Iowa Corn involved in sequencing. (2009)
  • Meeting in Chicago...what to do with sequence going forward. Lots of folks from industry and public sector. Needed to develop functional genomics piece. Discussions continued, some change at National Corn--in 2011, realized phenotyping was needed. Continue to push on this. MaizeGDB did survey every year. Knew there was a need, but very complex to address.
  • 2013 corn genetics meeting, NCC167 meeting before this. Chalk talks-can give two slides about whatever. Ed gave talk on GxE proposal. Dave followed with phenotyping and suggested that they do the experiment. Took short amount of time to get it going.
    Just do it. Fairly large scale trial
  • 15 different states, folks work independently. Collaborative nature was huge success. Community building exercise more than anything the first year, data is a bonus.
  • Iowa Corn Growers committed multiple funding avenues-committed another ½ million next year to provide support for GxE trial. * Backbone of this larger initiative. Have to have plots and data to build components, also to get outside folks interested.
  • Remote sensing, could use this as validation
    • LOOKING TO BUILD A SYSTEM. Deposit and extract corn phenotype data and link with genomic data, people contribute and draw from it to figure out ways to use it
    • DATA COMMONS FOR THIS GROUP IS REQUIRED!!!!!
  • G, P, Environmental, remote sensing data, etc. all connected
    • Crop modelers can leverage this trying to more broad information to complete networks, etc.
    • Trying to mimic what they did with genome database - no more earmarks. Have to be in the budget process now. Goal is to raise funds to get a lobbyist- want $25 million a year to do this. (going to phenotyping)
MaizeGDB
  • Interested in what datasets can they directly integrate, how can we improve interoperability and where they fit in
  • History-over 20 years of data and code incorporated. 2003 MaizeGDB created. Added sequence centric resources.
    • Interest in deploying tools to get better access to the data. Now folks want to get information from gene lists, identify patterns or enrichment against genes or gene models-create and do analysis of lists.
    • Community hub for working with maize genetics conference
  • Currently have B73 reference genome, at least a dozen others being sequenced, assembled and annotated.
    • How to integrate and represent?
  • Working with community to come up with nomenclature and standards on how to name assemblies, annotations, best practices to enable utilization
  • Working on metabolic pathway viewers
    • Intermine being used to create MaizeMine instance- (like Thalemine). Access to bulk data.
    • Improvements to speed and access, how folks interact with different datasets
  • On 3rd release of genome, new annotation coming out
  • Carson: wants to understand how to enable MaizeGDB similar to CoGe did for AIP.
    • Set up call with CoGe team and Carson.
  • GRC-genome reference consortium-looking to bring maize genome into their data model good tools to represent alternative loci (high diversity, CNV) in maize
    • MO17, large regions can’t be anchored in B73-represent this on reference assembly and put features on top of regions.
  • Also allow community members to report problems. Problems can be shown as patches on gene model. Issue tracking software in place and can give to next group doing full assemblies. Work on integrating genes and gene models. One common view.
  • Have quite a bit of information on phenotypes, metabolic pathway information.
  • Different variation images, associated loci. Links to different stocks
    • Challenges would be in high throughput imaging would have to leverage iPlant.
      • Would it be just viewing data? Can render with BISQUE-extracting information and tying to metadata-that system is not in place in current schema.
      • HT phenotyping, will change quickly. MaizeGDB has been a place where you can get high quality, high confidence datasets, when you have emerging datatype, difficult to integrate something that is changing all the time. Want to have the emerging data accessible, but not served by it.
      • Will there be pre-analysis done? Mirrored Ed’s way of storing GBS data-provide web services to allow websites to enable folks to programmatically access the data. Goal is to make this modularized so they can be leveraged to be replaced as needed.
G2F project challenges
  • Image data-have fields/plots-all kinds of ways to “summarize pixels”, they haven’t tried these yet. Don’t know what will work yet.
  • Data challenges:
    • Keeping track of seed sources and pedigrees.
      • Everyone has sources of B73, but “source” or “stock”, where it came from, season, location, breeder identifies exactly where that seed was produced. Want to id both source and pedigree.
    • With pedigree-folks use different nomenclature for naming pedigrees. Someone needs a standard pedigree database, where it came from (public) and aliases=smart way to identify ‘what someone might mean’---google like “did you mean…..”
      • Purdy notation-this was the standard (published in crop science in 1960s). Paper in 2011 talked about writing of pedigrees and sharing that information. (in Crop Science).
  • Datastore available with standard known pedigrees and what they are going to use, if they had a way to show what the hybrid/backcross would look like. Give folks the ability to upload list of pedigrees and figure out connecting genotype to origin of the seed
    • Possible requirement of running GBS on seed prior to collecting data and store that with pedigree.
      • Possibility of TNRS-like infrastructure for this?
  • Where did you get the data?! Very important! A lot of phenotype data is on hybrids, will need to know who is the female.
    • Imprinting, epigenetic information that would be important.
    • Need a pedigree database of some type to record female and male parent.
    • Some folks still do things with synthetic populations.
BISQUE demo:
  • Image of field, can you map pixels for a plant range for what is there? Image in addition to geospatial mapping
  • Has anyone been using the map function in BISQUE? Mapping different information within a field?
    • Analyzing the spatial information may be out of scope for Bisque. Bisque can show the information of where it is, but would need to analyze it within geospatial platform
  • Want to be able to have comparison across environments, look at images of genotype in field in those environments-look from GPS location perspective
    • FOLLOW UP ON WHAT SORTS OF MODELING NEEDS GROUP HAS
  • Questions for follow up:
    • Tag images in Bisque would translate over to broader need. Group doing flyovers that Ramona knows of might be similar
    • Saving project to local machine, what goes into XML files?
    • Is Bisque metadata associated with file in data store?
iPlant demos given
  • No labels