Pipeline on microarray data analysis: Pre-processing

Authors

  • Rohmatul Fajriyah Universitas Islam Indonesia
  • Noodchanath Kongchouy Prince of Songkla University
  • Wanvisa Saisanan Na Ayudhaya Walailak University
  • Rahmadi Yotenka Universitas Islam Indonesia
  • Ghiffari Ahnaf Danarwindu Universitas Islam Indonesia

DOI:

https://doi.org/10.12928/bamme.v5i1.12539

Keywords:

affymetrix, bioinformatics, microarray, pre-processing

Abstract

Bioinformatics is blooming and its data are store in some repository offline and or online. Yet some basic concepts are not fully disseminated. The paper intends to provide the reader with a review of one important concept in the pipeline bioinformatics data analysis of microarray, pre-processing. In pre-processing, there are four steps, background correction, normalization, probe correction and summarization. Each step consists of several methods, and we describe each method to give a better understanding on how it works theoretically. We focused on microarray data from Affymetrix platform with single-color chip.

References

Affymetrix, I. (2002). Statistical algorithms description document. Technical paper, 62, 110.

Astrand, M. (2003). Contrast normalization of oligonucleotide arrays. Journal of Computational Biology, 10(1), 95–102. https://doi.org/10.1089/106652703763255697

Baans, O. S., Jambek, A. B., & Said, K. A. M. (2019). Analysis of normalization method for DNA microarray data. Asia-Pacific Journal of Molecular Biology and Biotechnology, 27(4), 30–37. https://doi.org/10.35118/apjmbb.2019.027.4.04

Barbacioru, C. C., Wang, Y., Canales, R. D., Sun, Y. A., Keys, D. N., Chan, F., Poulter, K. A., & Samaha, R. R. (2006). Effect of various normalization methods on Applied Biosystems expression array system data. BMC Bioinformatics, 7, 1–14. https://doi.org/10.1186/1471-2105-7-533

Bolstad, B. M. (2004). Bolstad_2004_Dissertation. 156. papers2://publication/uuid/8B996D4A-CD91-4F11-9F50-7B5E60EFC00C

Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003). Gene Expression Omnibus A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics, 19(2), 185–193. http://www.ncbi.nlm.nih.gov/geo

Chen, Z., McGee, M., Liu, Q., & Scheuermann, R. H. (2007). A distribution free summarization method for Affymetrix GeneChip® arrays. Bioinformatics, 23(3), 321–327. https://doi.org/10.1093/bioinformatics/btl609

Cheng, L., Lo, L. Y., Tang, N. L. S., Wang, D., & Leung, K. S. (2016). CrossNorm: A novel normalization strategy for microarray data in cancers. Scientific Reports, 6, 1–2. https://doi.org/10.1038/srep18898

Cleveland, W. S. (1979). Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74(368), 829. https://doi.org/10.2307/2286407

Cleveland, W. S., & Devlin, S. J. (1988). Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting. Journal of the American Statistical Association, 83(403), 596. https://doi.org/10.2307/2289282

Dozmorov, M. G., Guthridge, J. M., Hurst, R. E., & Dozmorov, I. M. (2010). A comprehensive and universal method for assessing the performance of differential gene expression analyses. PLoS ONE, 5(9), 1–11. https://doi.org/10.1371/journal.pone.0012657

Dudoit, S., Yang, Y. H., Speed, T. P., & Callow, M. J. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica, 12(1), 111–140. http://www1.cs.columbia.edu/~cleslie/cs4761/lectures/speed-statistical.pdf

Durbin, B. P., Hardin, J. S., Hawkins, D. M., & Rocke, D. M. (2002). A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics (Oxford, England), 18 Suppl 1, S105–S110. https://doi.org/10.1093/bioinformatics/18.suppl_1.s105.

Fajriyah R. (2021). Paper review: An overview on microarray technologies. Bulletin of Applied Mathematics and Mathematics Education, 1(1), 21-30.

Federico, A., Saarimäki, L. A., Serra, A., Giudice, G. Del, Kinaret, P. A. S., Scala, G., & Greco, D. (2022). Microarray Data Preprocessing: From Experimental Design to Differential Analysis. 24(01), 79–100. https://doi.org/10.1007/978-1-0716-1839-4_7

Fujita, A., Sato, J. R., de Oliveira Rodrigues, L., Ferreira, C. E., & Sogayar, M. C. (2006). Evaluating different methods of microarray data normalization. BMC Bioinformatics, 7, 1–11. https://doi.org/10.1186/1471-2105-7-469

Gautier, L., Bolstad, B. M., Cope, L., & Irizarry, R. A. (2004). Affy - Analysis of Affymetrix GeneChip data at the probe level. Bioinformatics, 20(3), 307–315. https://doi.org/10.1093/bioinformatics/btg405

Gharaibeh, R. Z., Fodor, A. A., & Gibas, C. J. (2008). Background correction using dinucleotide affinities improves the performance of GCRMA. BMC Bioinformatics, 9, 1–12. https://doi.org/10.1186/1471-2105-9-452

Giorgi, F.M., Bolger, A.M., Lohse, M. (2010). Algorithm-driven Artifacts in median polish summarization of Microarray data. BMC Bioinformatics, 11, 553. https://doi.org/10.1186/1471-2105-11-553.

Gondro, C. (2009). Summarization methods and quality problems in Affymetrix microarrays. Proc Assoc Advmt Anim Breed Genet, 18(February).

Grant, G. R., Manduchi, E., & Stoeckert, C. J. (2007). Analysis and management of microarray gene expression data. Current Protocols in Molecular Biology / Edited by Frederick M. Ausubel ... [et Al.], Chapter 19, 1–30. https://doi.org/10.1002/0471142727.mb1906s77

Hartemink, A. J., Gifford, D. K., Jaakkola, T. S., & Young, R. A. (2001). Maximum-likelihood estimation of optimal scaling factors for expression array normalization. Microarrays: Optical Technologies and Informatics, 4266(Ml), 132–140. https://doi.org/10.1117/12.427981

Hochreiter, S., Clevert, D. A., & Obermayer, K. (2006). A new summarization method for affymetrix probe level data. Bioinformatics, 22(8), 943–949. https://doi.org/10.1093/bioinformatics/btl033

Huber, W., Von Heydebreck, A., Sültmann, H., Poustka, A., & Vingron, M. (2002). Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18(SUPPL. 1). https://doi.org/10.1093/bioinformatics/18.suppl_1.S96

Irizarry, R. A., Bolstad, B., Collin, F., Cope, L. M., Hobbs, B., & Speed, T. P. (2003). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research, 31(4), e15. https://doi.org/10.1093/nar/gng015

Klaus, B., & Reisenauer, S. (2018). An end to end workflow for differential gene expression using Affymetrix microarrays. F1000Research, 5, 1–56. https://doi.org/10.12688/f1000research.8967.2

Kuyuk, S. A. (2017). Commonly used statistical methods for detecting differential gene expression in microarray experiments. Biostatistics and Epidemiology International Journal, 0(0), 1–8. https://doi.org/10.30881/beij.00001

Li, C. and Wong, W. . (2001a). Model-based analysis of oligo- nucleotide arrays: expression index computation and outlier detection. Computational Statistics & Data Analysis, a(98), 31–36.

Li, C. and Wong, W. H. (2001b). Model-based analysis of oligo- nucleotide arrays: model validation, design issues and standard error application. b(2), 1-11.

Microarray Galaxy User’s Guide. (2023). Microarray Galaxy User’s Guide. http://www.ensat.ac.ma/mobihic/microarray-galaxy.html

Miranda, J., & Bringas, R. (2008). Analysis of DNA microarray data. Part I: Technological background and experimental design. Biotecnologia Aplicada, 25(2).

Munster, S., VL, W., Hutchings, DC., B. D., & Nicholson, S. (2018). Comparison Study of Microarray and RNA-seq for Differential Expression. Final Report. https://doi.org/DOT/FAA/AM-20/09

Wright Muelas, M., Mughal, F., O’Hagan, S. et al. The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data. Sci Rep 9, 17960 (2019). https://doi.org/10.1038/s41598-019-54288-7

Naef, F., & Magnasco, M. O. (2003). Solving the riddle of the bright mismatches: Labeling and effective binding in oligonucleotide arrays. Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 68(1), 4. https://doi.org/10.1103/PhysRevE.68.011906

Olson, N. E. (2006). The Microarray Data Analysis Process: From Raw Data to Biological Significance. NeuroRx, 3(3), 373–383. https://doi.org/10.1016/j.nurx.2006.05.005

Pelz, C. R., Kulesz-Martin, M., Bagby, G., & Sears, R. C. (2008). Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data. BMC Bioinformatics, 9(January 2009). https://doi.org/10.1186/1471-2105-9-520

Piccolo, S. R., Ying Sun, Campbell, D, J., Lenburg, M. E., Bild, A. H., & W Evan Johnson. (2012). A single-sample microarray normalization method to facilitate personalized-medicine workflow. Genomics, 100(6), 337–344. https://doi.org/10.1016/j.ygeno.2012.08.003

Ritchie, M. E., Silver, J., Oshlack, A., Holmes, M., Diyagama, D., Holloway, A., & Smyth, G. K. (2007). A comparison of background correction methods for two-colour microarrays. Bioinformatics, 23(20), 2700–2707. https://doi.org/10.1093/bioinformatics/btm412

Serin, A. (2011). Biclustering Analysis for Large Scale Data. September. http://www.diss.fu-berlin.de/diss/receive/FUDISS_thesis_000000035625?lang=en

Silver JD, Ritchie ME, & S. G. (2009). Microarray background correction: maximum likelihood estimation for the normal-exponential convolution. Biostatistics and Epidemiology International Journal, 10(2), 52–63. https://doi.org/10.1093/biostatistics/kxn042

Smyth, G. K. (2006). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray. 3(1), 1–26.

Smyth, G. K., & Speed, T. (2003). Normalization of cDNA microarray data. Methods, 31(4), 265–273. https://doi.org/10.1016/S1046-2023(03)00155-5

TechMedBuddy,. (2023). Microarray Data Analysis in Bioinformatics: A Comprehensive Overview. https://www.linkedin.com/pulse/microarray-data-analysis-overview-techmedbuddy/

Visentin, L., Scarpellino, G., Chinigò, G., Munaron, L., & Ruffinatti, F. A. (2022). BioTEA: Containerized Methods of Analysis for Microarray-Based Transcriptomics Data. Biology, 11(9), 1–14. https://doi.org/10.3390/biology11091346

Wright Muelas, M., Mughal, F., O’Hagan, S., Day, P. J., & Kell, D. B. (2019). The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data. Scientific Reports, 9(1), 1–21. https://doi.org/10.1038/s41598-019-54288-7

Wu, Z. (2009). A Review of Statistical Methods for Preprocessing. Nih, 71(2), 233–236. https://doi.org/10.1177/0962280209351924.A

Wu, Z., Irizarry, R. A., Gentleman, R., Martinez-Murillo, F., & Spencer, F. (2004). A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association, 99(468), 909–917. https://doi.org/10.1198/016214504000000683

Yang, J., & Thorne, N. (2002). Normalization for Two-color cDNA Microarray Data. Science and Statistics: A Festschrift for Terry Speed, 403–418.

Yang, Y., S, D., P, L., DM, L., V, P., J, N., & TP, S. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30(4).

Downloads

Published

2025-07-18

Issue

Section

Articles