Coding structures are employed to allow the use of categorical variables in multiple regression analysis and other more sophisticated models. The first part of this paper provides a basic guide for coding schemes implementation R statistical software. In fact, building the popular coding structures in R is easily realized using some embedded functions such as contr.treatment, contr.sum, contr.helmert, contr.wec and contr.poly for dummy, effect, reverse helmert, weighted effect coding and polynomial contrasts respectively. Also, the interpretation of their output is straightforward and only requires knowing the type of comparison being realized, the coefficients of assignment and their sign. On the other hand, the second part of this work evaluated the relative performance of the popular coding structures: dummy, effect, reverse helmert, weighted effect coding, using a Monte-Carlo simulation. The effects of the effect size, the sample size, the number of levels of the factor, the type of distribution of the response variable (normality against moderate non-normality) and the correction method for Type I error inflation were checked using a per-contrast (on individual contrasts) performance criterion and an overall (all the contrasts simultaneously) performance criterion. Simulations revealed that the correction method used for Type I error inflation had no effects on dummy, effect, reverse helmert and weighted effect coding per-contrast performance. This performance was only affected by the effect size, the number of levels and the type of distribution for all the structures. Furthermore, the overall performance of dummy, effect, reverse helmert and weighted effect coding was varying in function of the effect size, the number of levels and the type of the distribution. The correction method had a very slight effect on weighted effect coding and no influence on the other schemes. Globally, the correction method has no influence on the coding schemes performance while the effect size greatly affected the performance. The performance of these techniques was also associated with data following a normal distribution. No specific pattern was portrayed for the number of levels and the sample size. In spite of the fact that all these coding techniques do not imply the same type of comparisons and do not have the same internal structure, weighted effect coding was the least influenced as compared to the others.
Date of publication:
RUFORUM Theses and Dissertations
Agris Subject Categories:
RUFORUM (Grant No. RU 2015 GRG – 135)
Romain Lucas Glele Kakaï