This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Skip to end of metadata
Go to start of metadata

Conversion can be done by R/QTL, TASSEL and ANCESTRYMAP

ANCESTRYMAP can covert among

  • ANCESTRYMAP
  • EIGENSTRAT
  • PED
  • PACKEDPED
  • PACKEDANCESTRYMAP

http://genepath.med.harvard.edu/~reich/InputFileFormats.htm

R/QTL supported format (read.cross, write.cross)

CSV format: The input ?le is a comma-delimited text ?le. A different ?eld separator may be speci?ed via the argument sep, which will be passed to the function read.table). For example, in Europe, it is common to use a comma in place of the decimal point in numbers and so a semi-colon in place of a comma as the ?eld separator; such data may be read by using sep=";" and dec=",". 

  1. The ?rst line should contain the phenotype names followed by the marker names. At least one phenotype must be included; for example, include a numerical index for each individual.
  2. The second line should contain blanks in the phenotype columns, followed by chromosome identi?ers for each marker in all other columns. 
  3. An optional third line should contain blanks in the phenotype columns, followed by marker positions, in cM. Marker order is taken from the cM positions, if provided; otherwise, it is taken from the column order.
  4. Subsequent lines should give the data, with one line for each individual, and with phenotypes followed by genotypes. If possible, phenotypes are made numeric; otherwise they are converted to factors. The genotype codes must be the same across all markers. For example, you can’t have one marker coded AA/AB/BB and another coded A/H/B. The cross is determined to be a backcross if only the ?rst two elements of the genotypes string are found; otherwise, it is assumed to be an intercross.

CSVr format: This is just like the csv format, but rotated (or really transposed), so that rows are columns and columns are rows.

CSVs format: This is like the csv format, but with separate ?les for the genotype and phenotype data. 

  1. The ?rst column in the genotype data must specify individuals’ identi?ers, and
  2. There must be a column in the phenotype data with precisely the same information (and with the same name).
  3. These IDs will be included in the data as a phenotype. If the name id or ID is used, these identi?ers will be used in top.errorlod, plot.errorlod, and plot.geno as identi?ers for the individual.
  4. In the genotype data ?le, the second row gives the chromosome IDs. The cell in the second row, ?rst column, must be blank. A third row giving cM positions of markers may be included, in which case the cell in the third row, ?rst column, must be blank. There need be no blank rows in the phenotype data ?le.

CSVsr format:  This is just like the csvs format, but with each ?le rotated (or really transposed), so that rows are columns and columns are rows.

Mapmaker format: This format requires two ?les.

  1. The so-called raw?le, speci?ed by the argument file, contains the genotype and phenotype data. 
    1. Rows beginning with the symbol # are ignored. 
    2. The ?rst line should be either data type f2 intercross or data type f2 backcross. 
    3. The second line should begin with three numbers indicating the numbers of individuals, markers and phenotypes in the ?le. This line may include the word symbols followed by symbol assignments (see the documentation for mapmaker, and cross your ?ngers). 
    4. The rest of the lines give genotype data followed by phenotype data, with marker and phenotype names always beginning with the * symbol.
  2. A second ?le contains the genetic map information, speci?ed with the argument mapfile. The map ?le may be in one of two formats. The function will determine which format of map ?le is presented.
    1. The simplest format for the map ?le is not standard for the Mapmaker software, but is easy to create. The ?le contains two or three columns separated by white space and with no header row.
      1. The ?rst column gives the chromosome assignments. 
      2. The second column gives the marker names, with markers listed in the order along the chromosomes. 
      3. An optional third column lists the map positions of the markers.
    2. Another possible format for the map ?le is the .maps format, which is produced by Mapmaker.
    3. Marker order is taken from the map ?le, either by the order they are presented or by the cM positions, if speci?ed.read.cross 177

Map Manager QTX format: This format requires a single ?le (that produced by the Map Manager QTX program).

QTL Cartographer format: This format requires two ?les: the .cro and .map ?les for QTL Cartographer

  1. Produced by the QTL Cartographer sub-program, Rmap and Rcross 
  2. Note that the QTL Cartographer cross types are converted as follows: 
    1. RF1 to riself, 
    2. RF2 to risib, 
    3. RF0 (doubled haploids) to bc, 
    4. B1 or B2 to bc, 
    5. RF2 or SF2 to f2.

Gary format: This format requires the six ?les. All ?les have default names, and so the ?le names need not be speci?ed if the default names are used.

  1. genfile (default = "geno.dat") contains the genotype data. The ?le contains one line per individual, with genotypes for the set of markers separated by white space. Missing values are coded as 9, and genotypes are coded as 0/1/2 for AA/AB/BB.
  2. mapfile (default = "markerpos.txt") contains two columns with no header row: the marker names in the ?rst column and their cM position in the second column. If marker positions are not  available, use mapfile=NULL, and a dummy map will be inserted.
  3. phefile (default = "pheno.dat") contains the phenotype data, with one row for each mouse and one column for each phenotype. There should be no header row, and missing values are coded as "-".
  4. chridfile (default = "chrid.dat") contains the chromosome identi?er for each marker.
  5. mnamesfile (default = "mnames.txt") contains the marker names.
  6. pnamesfile (default = "pnames.txt") contains the names of the phenotypes. If phenotype names ?le is not available, use pnamesfile=NULL; arbitrary phenotype names will then be assigned.

Karl format: This format requires three ?les; all ?les have default names, and so need not be speci?ed if the default name is used.

  1. genfile (default = "gen.txt") contains the genotype data. The ?le contains one line per individual, with genotypes separated by white space. Missing values are coded 0; genotypes are coded as 1/2/3/4/5 for AA/AB/BB/not BB/not AA.
  2. mapfile (default = "map.txt") contains the map information, in very complicated format
  3. phefile (default = "phe.txt") contains a matrix of phenotypes, with one individual per line. The ?rst line in the ?le should give the phenotype names

GenABEL supported format

To import data to GenABEL, two files are needed

  1. phenotypic data (relates study subjects IDs with values of covariates and outcomes)
    1. The first line gives a description (unique variable name) of the data contained in a particular column
    2. The first column of the phenotype file must contain the subjects's unique ID, named "id" (same in the geno file with quotes)
    3. "sex" column (0=female, 1=male); other columns contain phenotypic information; missing values are coded as "NA"
  2. genotypic data
    1. For every SNP, information on map position, chromosome, and strand should be provided
    2. For every person, SNP genotype should be provided

illumina format: convert.snp.illumina

PLINK tped format: convert.snp.tped

Standard linkage-like file: convert.snp.ped

MACH format: convert.snp.mach

TASSEL supported format

  1. BLOB, Hapmap, PLINK, Flapjack, Phylip, also supported for pipeline? 
  2. Numerical data format: Trait, covariate, TASSEL 2.1, Repeated measurements, Square Numerical Matrix (kinship), Genetic Map

FaST-LMM supported format

  1. SNP data to be tested (in PLINK format)
  2. SNP data for determining genetic similarities between individuals (in PLINK format)
  3. phenotype data
  4. (optional) a set of covariates

QxPak supported format

  1. Parameters file: how to integrate into DE?
  2. Data file: individual id (first column) and trait or effects (subsequent columns with missing value as 0)
  3. Pedigree file: individual id, father, mother, ((sex), breed)
  4. Marker file: 
    1. Usual format: first record: chromosome; successive records: individual, allele1_mkr1, allele2_mkr1, etc
    2. Transposed format: good for many more markers than individuals; first row: individual; following rows: SNP_name, chr_number, ind1 allele1, ind1 allele2, ind2 allele1, etc
  5. User defined covariance matrix file: row, column, value
  6. Haplotype file: first row: name of chromosome; following rows: individual, order of markers where phases known

VBAY supported format

  1. Vbay takes genotypes in the widely used TPED format of Plink [2] (http://pngu.mgh.harvard.edu/~purcell/plink/) and genotypes must be coded as 1 or 2. Convert a Plink genotype file to this format using options: ­­transpose ­­recode12
  2. Phenotype data and sex data are read from TFAM files generated with the Plink options above. In general, the format is 6 columns for each individual:
    1. Family ID Individual ID Paternal ID Maternal ID Sex (1=male; 2=female; other=unknown) Phenotype
    2. For case/control data, the 1=case and 2=control as in Plink.
  3. Covariates are stored in a text file where each row is an individual and each space­ delimited column is a covariate. A mean term is included by default.

MECPM supported format

simple text format with snp id followed by genotype in each row. No trait data?

Random Jungle supported format

PED format plus trait data

QTL Network supported format

Also accept the OUT data format of QTL Cartographer (.map and .cro) and the data format of MapMaker/QTL (.map and *.raw). It should be noted that the marker number and order in the RAW file  (.raw) of MapMarker/QTL must be exactly consistent with that in the map file. And the population type after the keyword “TYPE” must have the same specifications as those of QTLNetwork, i.e. “F2” for F2 population, “DH” for DH population, “RI” for Recombination inbred lines, “B1” and “B2” for backcross population to P1 and P2. 

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Mar 28, 2012

    Niki Robertson says:

    Can the sex column be omitted in GenABEL phenotypic data?

    Can the sex column be omitted in GenABEL phenotypic data?