To compute the overall mean for a variable we use a SAS procedure, PROC MEANS. Means can also be computed by using the retain statement in a dataset, but that is much more complicated.
We use the data set 'datasetcombined' form the example in subsetting and merging datasets. For the variable ‘sales’ uthe mean sale is computed and the variable containing this mean sale is called 'msales'.
proc means data=datasetcombined ;
output out=meansales mean(sales)=msales;
The proc means is called. This procedure can compute means, sums, standard deviations and so on. The data set in use is datasetcombined.
var indicates for which of the variables the computations should be made. In this case the variable is called sales.
output specifies in which data set the computations are stored and which names the new variables should have. The the new data set is called 'meansales' and the new variable is called 'msales'. See the contents of meansales by printing it:
proc print data=meansales;
Merge the overall mean with the original data set
Sometimes it is useful to use differences to the overall mean as observations in an analysis. For this we first need to combine the overall mean with the data set itself.
if _N_=1 then set meansales;
We create a new data set 'meanandoriginal' in a data step. To get the overall mean in each line of the original data set 'datasetcombined' we use if _N_=1, which calls a SAS internal variable. If we instead choose if _N_=4 we get the overall mean starting from row 4.
To print the new data set we use the proc print as usual. If we want to make the output easier to overview we could also sort the variables in a way that sales and msales are adjacent to each other. Using title also gives a heading to the printout. After you have run the program you need to write title; to remove the heading from future print outs and graphs.
proc print data=meanandoriginal;
var date weather sales msales;
title 'Sales in October 2006';