*read data;
data butterfly;
input Region $ tst Ringlet PearlyHeath LargeSkipper;
cards;
U 0 5 3 0
U 0 1 0 0
U 0 5 0 0
U 0 10 4 0
U 0 19 1 0
U 0 8 1 0
U 0 14 3 0
U 0 1 1 0
U 0 10 0 0
U 0 9 0 0
U 1 1 1 1
U 2 12 3 0
U 2 10 1 1
U 3 3 2 0
U 4 7 12 0
U 8 2 1 0
U 8 13 1 1
U 8 11 4 0
U 9 9 1 0
U 11 7 4 0
U 11 19 3 0
U 11 12 4 0
U 12 10 0 0
U 15 14 4 0
U 20 9 2 0
U 21 10 4 0
U 21 6 5 1
U 23 40 7 1
U 23 24 0 0
S 3 0 0 0
S 10 3 0 0
S 14 11 0 5
S 21 12 0 3
S 3 1 0 0
S 8 12 0 1
S 14 8 0 5
S 22 28 0 1
S 3 0 0 0
S 5 0 0 1
S 16 8 0 0
S 14 5 0 0
S 5 36 0 0
S 15 10 0 0
S 23 4 0 0
S 4 2 0 0
S 5 8 0 0
S 25 1 0 0
S 0 18 0 1
S 0 11 0 2
S 0 1 0 1
S 0 1 0 0
S 0 1 0 1
S 0 1 0 0
S 0 8 0 0
S 0 2 0 0
S 0 0 0 0
;run;
*Make a plot over the counts of the Large Skipper against tst, indicate the two regions;
proc sgplot data=butterfly;
scatter x=tst y=LargeSkipper /group=region ;
run;
*The plots are difficult to read because there are many observations on the same spot.
Jittering the observations helps, but this is only available in SAS 9.4 (or needs to be coded);
proc sgplot data=butterfly;
scatter x=tst y=LargeSkipper /group=region jitter;
run;
* A histogram is useful to see the distribution;
proc sgplot data=butterfly;
histogram LargeSkipper;
run;
* A simple poisson model, we explain the counts of Large Skipper by region and tst;
proc genmod data=butterfly plot=all;
class region;
model LargeSkipper=region tst region*tst /dist=poisson type3;
run;
* Effect plots can be made to illustrate the influence of tst;
proc genmod data=butterfly plot=all;
class region;
model LargeSkipper=region tst region*tst /dist=poisson type3;
effectplot fit (plotby=region);
run;
* Maybe assuming linearity for the variable tst is strong and we could try to use a categorical variable instead.
For example,
eco=0 :ecological farming is not used or has been used less than 5 years
eco=1 : ecological farming has been used at least 5 years and at most 10 years
eco=2 : ecological farming has been used for more than 10 years;
data butterfly1;
set butterfly;
eco=0;
if tst>=5 & tst <=10 then eco=1;
if tst>10 then eco=2;
run;
*The model;
proc genmod data=butterfly1 plots=all;
class region eco;
model LargeSkipper=region eco region*eco/dist=poisson type3;
run;
*Including means for all combinations of eco and region;
proc genmod data=butterfly1 plots=all;
class region eco;
model LargeSkipper=region eco region*eco/dist=poisson type3;
lsmeans region region*eco /pdiff ilink;
run;
*Note: This model can be fit also with the GLIMMIX procedure, the code looks similar, but there are
some differerences. The GLIMMIX should be used if there are random faktors (e.g. hierarchical sampling or repeated measurments in time);
proc glimmix data=butterfly1 plots=all;
class region eco;
model LargeSkipper=region eco region*eco /dist=poisson s;
run;