For the same data set as before (datasetcombined), we can also compute means for each weather category: This we can do within the PROC MEANS either by using the CLASS or the BY statement. The main difference is that BY works only for sorted data.
Generally, if you have large datasets, sorting the data first and then using a BY statement when computing means is more efficient.
proc sort data=datasetcombined;
proc means data=datasetcombined;
output out=meansalesbyweather mean=msales;
The proc means computes the mean sales by weather category. The data set 'datasetcombined' needs to be sorted according to the variable 'weather'. The computed means will be outputted to the data set 'meansalesbyweather'.
Note: When you sort data using a text variable, like weather, the sorting will be done in alphabetical order.
Merge the means with the original data set
To merge the means for each category with the original data set we use a data step with MERGE and BY.
merge meansalesbyweather datasetcombined;
Note: When you use merge the two data sets need to be sorted the same way using the by variable.