Compute the means of different categories

Last changed: 03 October 2019

For the same data set as before, the data set 'datasetcombined' from the example in subsetting and merging datasets, we can also compute means for each weather category: This we can do within the PROC MEANS either by using the CLASS or the BY statement. The main difference is that BY works only for sorted data.

Generally, if you have large datasets, sorting the data first and then using a BY statement when computing means is more efficient.

proc sort data=datasetcombined;

proc means data=datasetcombined;
var sales;
by weather;
output out=meansalesbyweather mean=msales;


The proc means computes the mean sales by weather category. The data set 'datasetcombined' needs to be sorted according to the variable 'weather'. The computed means will be outputted to the data set 'meansalesbyweather'.

Note: When you sort data using a text variable, like weather, the sorting will be done in alphabetical order.

Merge the means with the original data set 

To merge the means for each category with the original data set we use a data step with MERGE and BY.

data meanandoriginal;
merge meansalesbyweather datasetcombined;
by weather;

Note: When you use merge the two data sets need to be sorted the same way using the by variable.