The Shivplot: a graphical display for trend elucidation and exploratory analysis of microarray data

Woody, Owen Z; Nadon, Robert

doi:10.1186/1751-0473-1-6

Methodology
Open access
Published: 08 November 2006

The Shivplot: a graphical display for trend elucidation and exploratory analysis of microarray data

Owen Z Woody^1,2 &
Robert Nadon^1,3

Source Code for Biology and Medicine volume 1, Article number: 6 (2006) Cite this article

9235 Accesses
Metrics details

Abstract

Background

High-throughput systems are powerful tools for the life science research community. The complexity and volume of data from these systems, however, demand special treatment. Graphical tools are needed to evaluate many aspects of the data throughout the analysis process because plots can provide quality assessments for thousands of values simultaneously. The utility of a plot, in turn, is contingent on both its interpretability and its efficiency.

Results

The shivplot, a graphical technique motivated by microarrays but applicable to any replicated high-throughput data set, is described. The plot capitalizes on the strengths of three well-established plotting graphics – a boxplot, a distribution density plot, and a variability vs intensity plot – by effectively combining them into a single representation.

Conclusion

The utility of the new display is illustrated with microarray data sets. The proposed graph, retaining all the information of its precursors, conserves space and minimizes redundancy, but also highlights features of the data that would be difficult to appreciate from the individual display components. We recommend the use of the shivplot both for exploratory data analysis and for the communication of experimental data in publications.

Background

Microarrays [1] have provided a wealth of gene expression data for the biological community to interpret. The technology presents a snapshot of cellular transcription at an unprecedented level of detail, with certain array designs containing probes to assess expression levels of every known gene within the target organism's genome. Microarray data have been presented in an ever-growing number of publications, and confidence in the technology is growing as progress is made validating microarray findings.

The analysis of microarray data remains challenging, however, for a number of reasons. The technology is expensive, often restricting the number of replications that can be run in an experiment. Moreover, the validity of various statistical models and their corresponding processing algorithms for gene expression data are being actively debated [2, 3].

One of the more daunting but unavoidable aspects of microarrays, as with all high-throughput systems, is the sheer volume of data that must be examined. Researchers are unable to develop comprehensive familiarity with each of the millions of data points available in a typical microarray data set. Multi-display graphical methods are thus key for reaching an understanding of the data, for assessing analysis assumptions, and for examining the effects of data pre-processing methods.

Plotting allows the rapid and concise representation of the data as a whole, and allows trends and local aberrations to be spotted with relative ease [4]. These trends are useful for more than just diagnostic assessments; due to the relative paucity of replicate observations in many studies, supplementary information is frequently gathered from observations with similar expression properties. This information can be used, for example, to obtain an improved estimate of inter-array variability [5].

A further motivation for concise and elegant graphics is the space restrictions typically imposed by scientific journals. A thorough and well designed graphic needs to be explained just once, and its subsequent re-use can allow the reader rapid insight into volumes of visual data.

The construction of comprehendible graphics, however, is a challenge of its own. Unnecessary ink must be kept to a minimum, and it is easy to obscure or even mask important points through graphs that are complex or confusing. Efficiency, readability, and relevance are of paramount concern.

With these criteria in mind, the present work illustrates an aggregate graphical approach, inspired by (but not restricted to use with) microarray data. The proposed technique pools the strengths of several other graphical methods which are commonly applied to microarray data: a boxplot, a probability density function plot, and a plot illustrating variability as a function of signal intensity. The present work demonstrates that these plots can be productively combined, facilitating integration of complementary information. Although this plot design is motivated by microarray analysis, we present this technique as a viable tool for both exploration and publication of other types of high volume multivariate data sets.

Component plots

This section reviews the component plots that are assembled in our proposed graphical method. Simulated and empirical microarray data are used to illustrate the strengths and weaknesses of these respective plots when presented individually. The microarray data used in this section were obtained from the publicly available Affymetrix GeneChip spike-in data [6].

Boxplots

The boxplot [7] is an example of a concise graphic in which several critical pieces of information are presented about a univariate distribution. Firstly, the median is illustrated as a dot in the "box", with the position of the dot relative to the axis giving the specific value. The interquartile range (IQR), which contains the middle 50% of the data, is shown by the length between the outer edges of the box. The points farthest from the median that are not classified as outliers are marked as the tips of the whiskers extending from the box. In addition to these five distributional landmarks, outliers are explicitly plotted as lines or dots beyond the ends of the whiskers. A rule of thumb, such as 1.5 times the IQR beyond the edges of the box, is typically employed to distinguish how far from the center an observation must be to be classified as an outlier. Examples of boxplots can be seen in Figures 1a–c.

The five-point summary delivered in the box-and-whisker portions of boxplots allows rapid access to many aspects of the distribution. For example, indications of skewness can be seen both in the position of the median within the interquartile range box (closer to either end implicates skewness) and the lengths of the respective whiskers extending from the box (if one whisker is much longer than the other, skewness is a possible explanation).

Boxplots, however, have shortcomings. Although they are easily interpreted and are pleasing to the eye, they have poor data-ink ratio [8]. The "box" component of a boxplot can be disassembled into its 5 composite values, with the remaining ink merely providing assistance to the eye. For example, only one half of the boxplot, as divided by the horizontal axis of bilateral symmetry through the center of the boxplot, is necessary.

Furthermore, enclosing a "boxed" region encourages the interpretation of area, a quantity which is arbitrarily determined by the boxplot's vertical width. This two-dimensional depiction only distracts from the interpretation of one-dimensional distance. Although the marked quantities and their positions relative to one another are relevant, a majority of the lines surrounding and joining these values are not. Boxplots also potentially mask underlying features of the distribution, such as tail densities and multiple modes [9]. As an illustration of this hazard, compare the boxplot generated from a normal distribution (Figure 1a) to the boxplot of a distinctly bimodal (mixture of two non-overlapping normals) distribution (Figure 1b). From these two illustrations alone, it would be very difficult to discern that the two distributions have a different number of modes. This information can be recovered, however, through use of the density plot.

Density plots

Perhaps the easiest way to introduce the density plot is to first examine the related concept of a histogram [10]. Histograms work on a univariate data set by first stratifying the data into "bins" based on value, and then plotting the population count of each bin. The resulting graph displays how the data values are distributed; bins containing many observations have taller bars than sparsely populated bins. Histograms are excellent for detecting multiple modes, skewness, and kurtosis. Examples of histograms can be seen in Figures 1d–f.

Density plots [11] offer what is in essence a "smoothed" histogram; instead of using discrete bins and counts, density plots employ a continuous curve to communicate the same information. Area is used to convey the probability of observations within specified ranges. Specifically, were a sample value drawn from the depicted distribution, the probability of this value lying within a given interval can be determined by taking the integral of the curve bounded by that interval. Example density plots can be seen in Figures 1g–i. Like the histogram, density plots are excellent for detecting multiple modes, skewness, and kurtosis. The limitation of any approach that employs frequency as a metric is that extreme values and outliers are given little credence. These outlying values can carry a high leverage on important quantities such as the mean, yet they have only a subtle effect on the density curve. Although density plots do an excellent job estimating the position and number of modes, the mode is itself a highly variable estimator of the center of a distribution, and it is challenging to visually estimate the position of a mean or median from the density curve.

Variability-versus-intensity plots

It is typical for the variability of replicated microarray data to exhibit a dependence on intensity, whether replication is across or within arrays and whether the variability reflects processing effects only or processing plus biological effects. One common approach is to log gene expression data, which tends to stabilize error variance across replicates for mid-to-upper range intensity values but which has the disadvantage of inflating additive error for low intensity values. Alternative approaches to microarray data transformation model both additive and proportional error components (see the "generalized log" methods of [12–16]). Variability versus intensity plots can be used to assess how successfully such transformations have stabilized the variance throughout the entire intensity range. Additionally, the plot can also be used as a visual aid for pooling estimates of random error associated with genes of similar expression intensity [3, 17, 18].

The relationship of variability to mean intensity (i.e. the trend line) is the important piece of information to be obtained from this plot. The vast majority of points on the plot are of little interest to the viewer and can make the visual estimation of the trend challenging when data points are densely packed. As a method of capturing the trend of variability as a function of mean intensity, one can employ any of the loess family of smoothers [4, 19], which perform local weighted least squares regression to capture nonlinear trends in data. (An example loess fit can be seen in Figure 2c.)

Results and discussion

The Shivplot

As can be seen from Figure 2a–c, all three of the discussed plots – boxplot, density plot, and variability-versus-intensity (standard deviation vs mean) – have an axis in common. Figure 2d presents the shivplot, an amalgamation of the three plots. To first capture the essence of the boxplot, we eliminate its vertical width, thereby reducing it down to point components: the five distributional landmarks along with the outliers. These points can be displayed as ticks across the middle of the graph, at y = 0. Then, in the region below, we can draw the reflection of the density function plot (with the scale starting at zero and increasing downward along the lower half of the y-axis). The area enclosed by this curve below y = 0 represents the probability density. Lastly, we can draw the loess fit from the variability-versus-intensity plot in the upper half of the graph. One can quickly isolate the information in any of the three precursor graphs from the final image – there is no significant loss of information when the plots are superimposed. (The values along the right side of the graph provide maximum values: the maximum standard deviation above, and the maximum density below.)

The data interpreter needs to adjust to two aspects of this graph: The upper and lower halves of the plot have distinct y-axes, with different scales and units, and the density distribution is inverted. However, having the three graphs simultaneously available provides the advantage that each plot can readily be interpreted in the context of the other two. The advantages of this feature are best illustrated by examining empirical data, as provided in the following section.

Examples

There is an ongoing debate about the appropriate analysis of data derived from microarrays. As such, it is of great interest to biologists and statisticians alike to observe the impact of different statistical algorithms on microarray data. Having approachable visual tools with which to make such comparisons greatly expedites this tedious and iterative process.

The examples in this section come from three different data sets. To build familiarity with the shivplot, we use data from the Affymetrix U133 spike-in data set to demonstrate the appearance of the components of the shivplot, and then follow this up with their shivplot equivalent. From these shivplots, we will be able to evaluate distributional features that distinguish the expression estimates produced by different Affymetrix normalization procedures from one another.

Beyond distinguishing the nuances of algorithms, the shivplot can also be used to detect interesting distributional properties of data sets that typical exploratory graphics may understate or miss. To demonstrate this utility, we have obtained the data from two microarray studies that were specifically designed to examine technical aspects of microarray data structure. Specifically, data sets produced by [20] and [21] will be examined, as these data sets highlight unique advantages of the shivplot in an exploratory context.

For these latter two data sets, we also present a slight variation on our shivplot theme. When a large number of distributions are to be compared in tandem, granting each shivplot its own unique frame and axes may require too much space to be efficient. Separate frames also make it difficult to perceive inter-relationships among density, variability, and signal intensity. To address these issues, we introduce a multi-sample shivplot (Figures 8 and 9), in which each distribution is depicted upon a common set of axes.

Affymetrix spike-in data

As mentioned earlier, the microarray manufacturing company Affymetrix has produced spike-in data sets where the underlying transcript quantity information is engineered and thus known in advance. Such data sets have been used to design [22], field test [23], and compare [24] statistical algorithms. The specific details of these spike-in experiments can be found at the aforementioned website. We shall use our shivplot to compare a subset of Affymetrix analysis algorithms.

Figure 3 shows the individual components of the shivplot produced by three algorithms: MAS 5.0 [22], as implemented in [25], RMA [23], as implemented in [25], and dChip [26], as implemented in their dChip software, version '2004'. Figure 4 shows the shivplot representation of these data. Note that a wide variety of information can be gleaned from this side-by-side shivplot comparison. For example, the three metrics clearly do not share an absolute scale. Furthermore, they are in disagreement over the general distribution of points. RMA, for example, has a larger proportion of very small values compressed within a small range and with outliers solely in the high expression range compared to the other examined algorithms. Both MAS 5.0 and dChip, by contrast, produce approximately symmetrical distributions, as can be seen in the density portion of the shivplot.

MAS 5.0 has a step in its algorithm in which unrealistically small gene expression values are replaced with imputed values. The consequences of this approach can be seen uniquely in the MAS 5.0 shivplot as a decrease in variability across replicates at the (partially imputed) low end. A similar dip in average variability can be see at the low end of the RMA intensity spectrum, although this phenomenon does not have a satisfying methodological explanation like the one offered for MAS 5.0. The comparative plot also shows the substantially lower replicate variability of RMA relative to MAS 5.0 and dChip, a feature of the RMA algorithm that has been well documented [24]. Though dChip has relatively stable variance at the high end, there is an escalation in variability near the low end.

Figures 5a–c illustrate the shivplots scaled individually. Note that on this plot it is much easier to detect details such as the non-uniformity (relative to intensity) of RMA cross-replicate variability, a detail which was lost when the RMA variability was scaled against the comparatively massive variability of the dChip low end.

Kendziorski et al. pooling data

This study, assessing the merits of pooling biological samples, examined whether array-to-array variability was directly related to degree of pooling (i.e., 2 or 3 samples pooled per hybridization) as would be predicted by theory. Additionally, several technical array replications were performed on a subset of the samples used in the study, allowing for an evaluation of array-to-array variability when biological material is invariant. Five types of replication were taken from this study – biological replication, in which one animal is used per array, pools of 2 (or 3) animals, where multiple animal samples are pooled on each array, and two levels of technical replication – one where the sample taken from a single animal is processed on multiple arrays, and one where a pool of all available animal samples is processed on multiple arrays. Note that in the latter two groups, the array-to-array variability is purely technical – the same sample material is applied to each array. The former three capture various levels of "biological" variability – that is to say, an estimate of how much gene expression varies from animal to animal when treatment is invariate. Note that one would expect the observed inter-array variability to diminish as more animals are pooled per array (as pooling should act as a sort of "biological averaging" prior to hybridization). A multi-sample shivplot can be used to examine relationships and trends in the variability of feature expression levels as a function of both degree of pooling and/or normalization strategy. Figure 6 depicts the variability vs intensity relationship when the RMA normalization procedure is used to produce expression summary measures for each level of pooling individually. As shown by the top part of the shivplot (the variability vs intensity component), there is a striking difference at the high end between groups that retain some degree of biological variability (biological replicates, as well as pools of 2 and 3 animals) and those within which variability is purely technical (replicates of the same animal or replicates of the pool of all animals). Furthermore, for the medium-to-high end, the magnitude of the variability corresponds nicely with the amount of pooling, with pools of fewer animals producing more variable expression estimates than those with several animals.

Note that the simultaneous interpretation offered by the multi-sample shivplot is particularly helpful in making assessments about observed trends. Had only boxplots and/or expression density plots been examined, the distinction between data sets based on variability would have been invisible. On the other hand, however, solely examining the variability plot would allow the viewer to notice these trends but not allow any evaluation of their consequence. Differences at the low and high ends of measurement spectra may simply reflect limited sample size, and it is thus crucial to be able to supplement any difference in variability trend with information about the relative number of observations affected. If the differences only concern the lowest 5% of the data, for example, any trends observed may be artifacts of sampling, and even real effects may have little consequence when so few data points are influenced. Moreover, note that despite differences in the variance-intensity relationship, the distributions of the intensity measurements were highly similar. This information is difficult to recover when examining the components individually. Figure 7 was generated by examining various normalization strategies on biological replicates of a common experimental condition (with no pooling). Two additional normalization procedures are introduced in this figure, the GC-RMA [27] and GLA [28] algorithms. This allows us to observe how the various normalization algorithms influence trend in the variability observed in Figure 6, and how they affect the distribution of the data. Note that in this shivplot, the importance of the simultaneous presentation of the components is again evident. In particular, the variability plot alone would tempt the interpretation that the normalization algorithms can be roughly divided into two groups – those that have elevated low-end variability and those that do not. However, the boxplot and distribution density plots reveal that the story is more complicated. Specifically, although the measures with high variance at the low end seem to have relatively homogenous density distributions, the measures with low variance at the low end have strikingly different density distributions, resulting in completely different interpretations of the distribution of the same biological data (i.e. GC-RMA is bimodal, RMA is platykurtotic and possibly also bimodal, GLA has compressed range). This provides another example where examining the components of the shivplot individually would reduce their interpretability.

Choe et al. Golden Spike data

The final data set we examine here was the Golden Spike experiment produced for [21]. This data set was engineered so that a vast majority of the effect sizes between treatment and control condition were known in advance, as determined by a predetermined spike-in design. Although designed to serve as a benchmarking data set for algorithm comparison, the golden spike data set produced results at odds with other benchmark data sets and has been critiqued on methodological grounds [29]. More specifically, it was remarkable that in the comparison performed in [21], the standard RMA algorithm performed quite poorly, and in particular worse than MAS 5.0. This finding contrasts with results of other method comparison studies. In an attempt to identify a possible explanation for this unusual result, we apply the shivplot successively to the RMA and the MAS 5.0 processing of the Golden Spike expression data.

On one hand, Figure 8 presents a multi-sample shivplot of RMA normalized data, distinguished by group and by whether normalization was performed across conditions. It is noteworthy that the variance vs intensity relationship is quite different across conditions, even after executing the very strong quantile normalization procedure, integral to RMA, which largely homogenizes the expression measure distributions. This effect persists regardless of whether the conditions are normalized together or separately. As such, quantile normalization did not produce the variability discrepancy. An explanation of this discrepancy would be of relevance to anyone using variability to weight the significance of differential expression.

As in the analysis of the previous data set, the shivplot's simultaneous presentation of the variability-vs-intensity plot and the density distribution is clearly advantageous. Had the variability plot been examined individually, it would have been challenging to correctly predict the homogeneity of the component distributions. Similarly, it would have been difficult to predict from the highly similar distributional plots that the variability relationship would differ so markedly. The findings in this plot offer a graphical explanation why the (unusual) second round of normalization was deemed necessary by [21] to analyze their data.

Figure 9, on the other hand, which presents the multi-sample shivplot of the MAS 5.0 normalized data, tells a different story. In contrast to the crossing trend lines observed in Figure 8, the variance-intensity relationships for the control and treatment groups were highly similar, except for a small number of low intensity measurements.

It seems likely that either RMA has produced this unusual difference between groups (although, as we have demonstrated, the quantile normalization is unlikely to blame), or some aspect of the MAS 5.0 normalization procedure has managed to eliminate it from the raw data. While further detective work is beyond the scope of the present work, the shivplot was a pivotal tool in uncovering this intriguing phenomenon in the Golden Spike data.

Availability

R/S-PLUS code written by the authors to produce shivplots is available in a flat file [see Additional File 1]. Also provided are a tutorial demonstrating the use of the code [see Additional File 2] and example data sets formatted for use with the code [see Additional Files 3, 4]. Alternatively, the entire shivplot library is available as a package for R in either source form [see Additional File 5] or as a Windows binary [see Additional File 6].

Conclusion

Microarray data, like data obtained from all high-throughput assays, represent a complex system, and significant amounts of space must be allocated in each publication to introduce and familiarize the reader with the technology and methodology used in any analyses. With the space restrictions imposed by many scientific journals, it is important to present as much information as cleanly as possible in concise graphics. The present work illustrates how three graphical tools of growing importance to microarrays and other high-throughput platforms can be effectively combined into a single plot, not only saving space but facilitating insights obtainable from complementary simultaneous interpretation. Furthermore, the introduced tool is general enough to serve a variety of high-throughput purposes, both in microarray data analysis and exploratory data analysis in general.

References

Schena M, Shalon D, Davis RW, Brown PO: Quantitative Monitoring of Gene-Expression Patterns with a Complementary-DNA Microarray. Science. 1995, 270: 467-470. 10.1126/science.270.5235.467.
Article CAS PubMed Google Scholar
Craig BA, Black MA, Doerge RW: Gene expression data: The technology and statistical analysis. Journal of Agricultural Biological and Environmental Statistics. 2003, 8: 1-28. 10.1198/1085711031256.
Article Google Scholar
Nadon R, Shoemaker J: Statistical issues with microarrays: Processing and analysis. Trends in Genetics. 2002, 18: 265-271. 10.1016/S0168-9525(02)02665-3.
Article CAS PubMed Google Scholar
Cleveland WS: Visualizing data. 1993, Murray Hill NJ:, Summit NJ: At&T Bell Laboratories ; Published by Hobart Press
Google Scholar
Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics. 2006, 7: 55-65. 10.1038/nrg1749.
Article CAS PubMed Google Scholar
Affymetrix – Latin Square Data. [http://www.affymetrix.com/support/technical/sample_data/datasets.affx]
Hoaglin DC, Mosteller F, Tukey JW: Understanding robust and exploratory data analysis. 1983, New York: Wiley
Google Scholar
Tufte ER: The visual display of quantitative information. 2001, Cheshire, CT: Graphics Press
Google Scholar
Esty W, Banfield J: The Box-Percentile Plot. Journal of Statistical Software. 2003, 8.17: 1-14.
Google Scholar
Venables WN, Ripley BD: Modern applied statistics with S-PLUS. 1999, Statistics and computing, New York: Springer-Verlag, 3
Book Google Scholar
Silverman BW: Density estimation for statistics and data analysis. 1986, London ; New York: Chapman and Hall
Book Google Scholar
Durbin B, Hardin J, Hawkins D, Rocke DM: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics. 2002, 18: S105-S110.
Article PubMed Google Scholar
Durbin B, Rocke DM: Estimation of transformation parameters for microarray data. Bioinformatics. 2003, 19: 1360-1367. 10.1093/bioinformatics/btg178.
Article CAS PubMed Google Scholar
Geller SC, Gregg JP, Hagerman P, Rocke DM: Transformation and normalization of oligonucleotide microarray data. Bioinformatics. 2003, 19: 1817-1823. 10.1093/bioinformatics/btg245.
Article CAS PubMed Google Scholar
Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 1: 1-9.
Google Scholar
Rocke DM: Design and analysis of experiments with high throughput biological assay data. Seminars in Cell & Developmental Biology. 2004, 15: 703-713. 10.1016/j.semcdb.2004.09.007.
Article CAS Google Scholar
Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics. 2001, 17: 509-519. 10.1093/bioinformatics/17.6.509.
Article CAS PubMed Google Scholar
Jain N, Thatte J, Braciale T, Ley K, O'Connell M, Lee J: Local pooled error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics. 2003, 19: 1945-1951. 10.1093/bioinformatics/btg264.
Article CAS PubMed Google Scholar
Cleveland WS, Devlin SJ: Locally Weighted Regression – an Approach to Regression-Analysis by Local Fitting. Journal of the American Statistical Association. 1988, 83: 596-610. 10.2307/2289282.
Article Google Scholar
Kendziorski C, Irizarry RA, Chen KS, Haag JD, Gould MN: On the utility of pooling biological samples in microarray experiments. Proc Natl Acad Sci U S A. 2005, 102 (12): 4252-7. 10.1073/pnas.0500607102.
Article PubMed Central CAS PubMed Google Scholar
Choe SE, Boutros M, Michelson AMCGM, Halfon MS: Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biology. 2005, 6:
Google Scholar
Affymetrix: Microarray Suite User Guide, Version 5. 2001
Google Scholar
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.
Article PubMed Google Scholar
Cope LM, Irizarry RA, Jaffee HA, Wu ZJ, Speed TP: A benchmark for affymetrix GeneChip expression measures. Bioinformatics. 2004, 20: 323-331. 10.1093/bioinformatics/btg410.
Article CAS PubMed Google Scholar
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge YC, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang JH: Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 2004, 5:
Google Scholar
Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98: 31-36. 10.1073/pnas.011404098.
Article PubMed Central CAS PubMed Google Scholar
Wu Z, Irizarry R, Gentleman R, Murillo F, Spencer F: A model based background adjustment for oligonucleotide expression arrays. 2004
Google Scholar
Zhou L, Rocke DM: An expression index for Affymetrix GeneChips based on the generalized logarithm. Bioinformatics. 2005, 21 (21): 3983-9. 10.1093/bioinformatics/bti665.
Article CAS PubMed Google Scholar
Dabney AR, Storey JD: A reanalysis of a published Affymetrix GeneChip control dataset. Genome Biology. 2006, 7:
Google Scholar

Download references

Acknowledgements

The authors thank Carl Murie for his helpful comments on the manuscript. This work was supported by Genome Quebec Phase I Bioinformatics Consortium funding (High-Throughput Gene Expression) and Genome Canada/Genome Quebec funding (Microarray Data Analysis for Class Comparison: Methods, Software and Pedagogy).

Author information

Authors and Affiliations

McGill University and Genome Quebec Innovation Centre, 740 avenue du Docteur Penfield, Montreal, Quebec, H3A 1A4, Canada
Owen Z Woody & Robert Nadon
David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
Owen Z Woody
Department of Human Genetics, McGill University, Montreal, Quebec, Canada
Robert Nadon

Authors

Owen Z Woody
View author publications
You can also search for this author in PubMed Google Scholar
Robert Nadon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Nadon.

Electronic supplementary material

13029_2006_6_MOESM1_ESM.txt

Additional file 1: shivcode.txt. This file contains shivplot script code for both R and S-PLUS. To prepare an R/S-PLUS session to produce shivplots, either copy the contents of this file into the input console of an instance of R/S-PLUS or read the code in using the source() command (see tutorial.txt). There are several prepackaged plots included as examples. The instructions to produce these plots can be reviewed at any time by calling the shivHelp command. (TXT 50 KB)

13029_2006_6_MOESM2_ESM.txt

Additional file 2: tutorial.txt. This file provides instructions on how to read custom data into R or S-PLUS, and provides some examples of how to call the shivplot function. This is a good place to start for those unfamiliar with R or S-PLUS. (TXT 4 KB)

13029_2006_6_MOESM3_ESM.txt

Additional file 3: example.txt. This file contains some artificial data in a flat file format that is easily manipulated by R and S-PLUS. It is used in the tutorial, and can be used as a file format template for those unfamiliar with the import utilities of R and S-PLUS. (TXT 2 KB)

13029_2006_6_MOESM4_ESM.txt

Additional file 4: alternate.txt. This file contains some artificial data in a flat file format that is easily manipulated by R and S-PLUS. It is used in the tutorial, and can be used as a file format template for those unfamiliar with the import utilities of R and S-PLUS. (TXT 2 KB)

13029_2006_6_MOESM5_ESM.zi

Additional file 5: source distribution of R shivplot package. This file is a compressed archive of the source files necessary to build the shivplot library in R. (ZI 13 KB)

13029_2006_6_MOESM6_ESM.addi

Additional file 6: Windows binary distribution of R shivplot package. This file is a compressed archive of the windows binary distribution of the shivplot library. (ADDI 81 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Woody, O.Z., Nadon, R. The Shivplot: a graphical display for trend elucidation and exploratory analysis of microarray data. Source Code Biol Med 1, 6 (2006). https://doi.org/10.1186/1751-0473-1-6

Download citation

Received: 30 May 2006
Accepted: 08 November 2006
Published: 08 November 2006
DOI: https://doi.org/10.1186/1751-0473-1-6

The Shivplot: a graphical display for trend elucidation and exploratory analysis of microarray data

Abstract

Background

Results

Conclusion

Background

Component plots

Boxplots

Density plots

Variability-versus-intensity plots

Results and discussion

The Shivplot

Examples

Affymetrix spike-in data

Kendziorski et al. pooling data

Choe et al. Golden Spike data

Availability

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Source Code for Biology and Medicine

Contact us