How to run dChip with Affymetrix PowerTools probeset-genotype to calculate copy number variation (CNV), loss of heterozygosity (LOH), and major copy proportion (MCP). by Russell Hanson, 8/19/10. www.russellhanson.com 1) Download CEL files from the internet: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16619 On GEO the gene expression omnibus Get a .tgz with many .CEL files 2) Unzip the CEL files 3) Generate the Genotype Calls using affymetrix powertools Download affymetrix powertools from the affymetrix site, Google "apt-probeset-genotype" Need five files total: 1: Chip Description File: Mapping250K_Nsp.cdf 2: .chrx file: Mapping250K_Nsp.chrx 3: Genome Info File: Mapping250K_Nsp genome info hg17.txt (For these three, need the proper NSP or STY type, they are different files. We typically find the STY type to have lower noise than the NSP type.) 4: Sample Info File: "OvarianNSP normal Sample_Info.XLS" 5: cel_file_list.txt for apt-probeset-genotype This is the command to run: Russell Hanson@RussellPC /cygdrive/g/russell/Prostate/ProstateNSP # apt-probeset-genotype -c Mapping250K_Nsp.cdf --chrX-snps Mapping250K_Nsp.chrx -o results_dir --cc-chp-output --cel-files cel_file_list.txt Read 40 cel files from: cel_file_list.txt Running ProbesetGenotypeEngine... Beginning analysis of 40 cel files. Opening layout file: Mapping250K_Nsp.cdf Reading 262338 probesets.........................................Done. Kept 262338 probesets. Read 5615 ChrX SNPs from chrX SNPs file Mapping250K_Nsp.chrx (v2) Reading and pre-processing 40 cel files........................................Done. Processing 1 chipstream. Computing sketch normalization for 40 cel datasets........................................Done. Applying sketch normalization to 40 cel datasets........................................Done. Finalizing 1 chipstream. Setting 262314 seeds for 262338 probesets. Using gender method dm-chrX-het-rate for genotype calling. Using inbred covariates none for genotype calling. Computing prior with: 10000 initial snps. 4675 of 10000 (46.75%) of SNPs had at least 2 observations per genotype. 4675 SNP clusters used to generate prior. Creating temporary files for CHP output Processing probesets.........................................Done. Flushing output reporters. Finalizing output. Creating final files for CHP output Finalizing Multi Data CHP Files.........................................Done. Run took approximately: 144.58 minutes. Done running ProbesetGenotypeEngine. 4) Put dChip and the exported .chp files into the same directory and the sample info file with the chip google "dchip download" dchipMar312010.exe 5) Make a sample info file Scan Name Sample Gender Ploidy(numeric) Grade Tumor (primaries or ascites) NA10851_FinSty_vR2_578246_A1_2_SC1 NA10851_FinSty_vR2_578246_A1_2_SC1 female 2 NA10855_FinSty_vR2_555824_A4_2_SC1 NA10855_FinSty_vR2_555824_A4_2_SC1 female 2 NA10863_FinSty_vR2_554144_A10_2_SC1 NA10863_FinSty_vR2_554144_A10_2_SC1 female 2 NA11831_FinSty_vR2_555824_A5_2_SC2 NA11831_FinSty_vR2_555824_A5_2_SC2 female 2 NA11832_FinSty_vR2_555824_A6_2_SC3 NA11832_FinSty_vR2_555824_A6_2_SC3 female 2 NA12056_FinSty_vR2_578246_A2_2_SC2 NA12056_FinSty_vR2_578246_A2_2_SC2 female 2 NA12057_FinSty_vR2_578246_A3_2_SC3 NA12057_FinSty_vR2_578246_A3_2_SC3 female 2 NA12234_FinSty_vR2_554144_A12_2_SC2 NA12234_FinSty_vR2_554144_A12_2_SC2 female 2 GSM302739 GSM302739 n n GSM302740 GSM302740 n n GSM302741 GSM302741 n n GSM302742 GSM302742 n n GSM302743 GSM302743 n n GSM302744 GSM302744 n p GSM302745 GSM302745 n n ^ all of the .CEL files, but you have to remove the ".CEL" from the names in the Sample Info file only, not the actual .CEL files on disk. 6) Put the chip description file, Mapping250K_Nsp.cdf and the genome info file Mapping250K_Sty genome info hg17.txt into the folder with dChip and the .CHP and .CEL files. 7) Put the normals from the Affymetrix site with the type matching the sample type STY/NSP/HIND/XBA 500K/SNP5/SNP6 in the same folder with both CEL and CHP or TXT. The normals should be the same as in the sample info file described above. http://www.affymetrix.com/support/mas/datasets.affx#1_2 e.g.: for 500K STY SNP data http://www.affymetrix.com/support/downloads/data/500K_data.sty.1.zip 8) In dCHIP, do "Open Group" File type: "CEL" Suffix: .chp <- make sure suffix and case matches the output from Affy Powertools' apt-probeset-genotype Also specify the Mapping250K_Nsp.cdf in Open Group. 9) dCHIP Options Model: Compute A & B allele signals for SNP array dCHIP Options Chromosome: Infer MCP 10) Normalize and Model *uncheck* view normalization plot *OK* --- wait a few hours. 11) Analysis/Chromosome --- Should display the LOH inferred from Hidden Markov Model Inferring LOH using the 'Hidden Markov Model' method... Estimated sample LOH rate: 0.43 0.34 0.09 0.42 0.30 0.21 0.39 0.20 0.37 0.31 0.37 0.25 0.12 0.38 0.45 0.41 0.19 0.06 0.22 0.25 0.25 0.24 0.13 0.36 0.32 0.40 0.16 0.56 0.46 0.36 0.29 0.40 0.23 0.25 0.31 0.41 0.37 0.37 0.29 0.23 0.45 0.34 0.32 0.38 0.32 0.37 0.16 0.30 0.24 0.35 0.30 0.19 0.36 0.34 0.50 0.37 0.13 0.08 0.43 0.11 0.42 0.40 0.20 0.27 0.44 0.28 0.41 0.37 0.37 0.21 0.47 0.44 0.03 0.29 0.24 0.28 0.31 0.34 0.34 0.36 0.38 0.00 0.01 0.00 0.00 0.25 0.00 0.01 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 Chro MT, marker 1 - 119, Sample THING_p_TCGA_B19_SNP_N_GenomeWideSNP_6_H08_495198_Signal_A Inferred sample LOH rate: 0.55 0.49 0.23 0.55 0.45 0.34 0.57 0.34 0.44 0.44 0.50 0.37 0.22 0.54 0.51 0.53 0.38 0.19 0.29 0.35 0.37 0.37 0.24 0.46 0.46 0.53 0.25 0.62 0.54 0.42 0.39 0.53 0.29 0.39 0.41 0.51 0.50 0.52 0.39 0.32 0.55 0.47 0.51 0.49 0.44 0.46 0.31 0.39 0.39 0.49 0.46 0.32 0.46 0.44 0.64 0.55 0.26 0.26 0.52 0.23 0.54 0.49 0.38 0.30 0.57 0.40 0.45 0.50 0.54 0.32 0.61 0.52 0.19 0.40 0.38 0.37 0.43 0.50 0.49 0.50 0.48 0.10 0.15 0.13 0.13 0.34 0.12 0.15 0.10 0.14 0.13 0.13 0.14 0.13 0.12 0.13 0.13 0.14 0.12 0.13 0.10 12) To export the copy number data go to: Chromosome "Export Data" 13) Get tab-delimited .xls plain-text file on disk OvarianSTY_dChip_inferred_SNP call_data.xls 14) To switch views between MCP, and LOH and log2 ratio Hit "d" on the keyboard to switch between data types. 15) That's it!! You have Excel files with the MCP, CNV, or log2 ratio data.