This post provides code to implement inductive selection/specification of log-linear models for contingency tables via the Lasso. I will exemplify the step-by-step implementation using data on intergenerational occupational mobility for 16 countries.
glmnet
, which aperforms Lasso or elastic-net regularization path for generalized linear models. Other packages for data manipulation will be also used. library("glmnet")
library("tidyverse")
library("modelr")
library("reshape2")
hg16
from the package logmult
. This is a cross-classification of subject’s occupational status (destination) and his father’s occupational status (origin) across 16 countries. # Inpute data and create contingency table as data.frame()
library("logmult")
data(hg16)
table <- ftable(hg16)
mydata <- as.data.frame(table)
names(mydata) <- c("origin","destination","country","Freq")
This is what the data looks like:
# Set reference categories
mydata$origin <- relevel(mydata$origin,ref = "Farm")
mydata$destination <- relevel(mydata$destination,ref = "Farm")
mydata$country <- relevel(mydata$country,ref = "Spain")
# Outcome variable is the frequencies in the contingency table
y <- mydata$Freq + 0.5 # add small constant to avoid problems with empty cells.
# Creates design matrix with all three-way interactions
formula <- as.formula( ~ .*.*.)
x <- model.matrix(formula, mydata[,c("origin","destination","country")])
# Penalty factors for weighted penalties.
# Here all variables have a penalty of 1, which is equivalent to unweighted penalties.
w <- rep(1,dim(x)[2])
# Unsilence the lines below if you wish to implement the "adaptive lasso" based on ridge estimates
#ridge.cv <- cv.glmnet(x,y,alpha=0,family='poisson', nfolds = 10)
#best_ridge_coef <- as.numeric(coef(ridge.cv, s = ridge.cv$lambda.min))[-1]
#w <- 1/(abs(best_ridge_coef))^(1)
glmnet
package. In this case you should set the parameter \(\alpha=1\) so that you use a Lasso penalty (alternatively, \(\alpha=0\) corresponds to a ridge penalty and \(\alpha=0.5\) is elastic nets). Because the dependent variable is a vector of frequencies in a contingency table I use a Poisson family with a log-link function. # "x" is the design matrix for the saturated model
# "y" is the dependent variable
# "alpha=1" indicates the use of a Lasso penalty
# "family=poisson" uses a log-link function by default
# "penalty.factor = w specifies the weight for each variable's penalty
# Fit model
lasso <- glmnet(x,y,alpha=1,family='poisson', penalty.factor = w)
# Plot the path of coefficients for values of lambda
plot(lasso,xvar="lambda")
# Cross-validation of Lambda
lasso.cv <- cv.glmnet(x,y,alpha=1,family='poisson', nfolds = 10, penalty.factor = w)
# Plot cross-validation error for each value of lambda
plot(lasso.cv)
# Extract optimum value of lambda
opt.lam <- c(lasso.cv$lambda.1se)
lasso.coefs <- coef(lasso.cv, s = opt.lam)
print(lasso.coefs)
145 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 9.12085673
(Intercept) .
originWhite Collar -2.08988473
originBlue Collar -1.91262871
destinationWhite Collar -0.64397157
destinationBlue Collar -0.13131034
countryAustralia -3.08419718
countryBelgium -3.15994469
countryFrance -1.71517333
countryHungary -0.97210452
countryItaly -3.29253459
countryJapan -2.94361973
countryPhilippines -1.17843790
countryUnited States -2.03696358
countryWest Germany -1.43042763
countryWest Malaysia -1.47979006
countryYugoslavia -3.78262426
countryDenmark -4.03057779
countryFinland -4.09436467
countryNorway -4.05957652
countrySweden -4.03704431
originWhite Collar:destinationWhite Collar 2.49071711
originBlue Collar:destinationWhite Collar 1.61331189
originWhite Collar:destinationBlue Collar 0.73290796
originBlue Collar:destinationBlue Collar 1.95783664
originWhite Collar:countryAustralia .
originBlue Collar:countryAustralia 0.07156901
originWhite Collar:countryBelgium .
originBlue Collar:countryBelgium .
originWhite Collar:countryFrance .
originBlue Collar:countryFrance .
originWhite Collar:countryHungary -1.03005913
originBlue Collar:countryHungary .
originWhite Collar:countryItaly .
originBlue Collar:countryItaly .
originWhite Collar:countryJapan .
originBlue Collar:countryJapan .
originWhite Collar:countryPhilippines -0.54757703
originBlue Collar:countryPhilippines -0.69820992
originWhite Collar:countryUnited States .
originBlue Collar:countryUnited States .
originWhite Collar:countryWest Germany 0.09309360
originBlue Collar:countryWest Germany .
originWhite Collar:countryWest Malaysia .
originBlue Collar:countryWest Malaysia -0.26259721
originWhite Collar:countryYugoslavia .
originBlue Collar:countryYugoslavia .
originWhite Collar:countryDenmark .
originBlue Collar:countryDenmark .
originWhite Collar:countryFinland .
originBlue Collar:countryFinland .
originWhite Collar:countryNorway .
originBlue Collar:countryNorway .
originWhite Collar:countrySweden .
originBlue Collar:countrySweden .
destinationWhite Collar:countryAustralia .
destinationBlue Collar:countryAustralia .
destinationWhite Collar:countryBelgium .
destinationBlue Collar:countryBelgium .
destinationWhite Collar:countryFrance .
destinationBlue Collar:countryFrance .
destinationWhite Collar:countryHungary -0.74394797
destinationBlue Collar:countryHungary .
destinationWhite Collar:countryItaly .
destinationBlue Collar:countryItaly .
destinationWhite Collar:countryJapan .
destinationBlue Collar:countryJapan -0.02763977
destinationWhite Collar:countryPhilippines -1.34939116
destinationBlue Collar:countryPhilippines -1.28669413
destinationWhite Collar:countryUnited States 0.14914382
destinationBlue Collar:countryUnited States 0.47475214
destinationWhite Collar:countryWest Germany .
destinationBlue Collar:countryWest Germany -0.16409332
destinationWhite Collar:countryWest Malaysia -1.08710815
destinationBlue Collar:countryWest Malaysia -0.93576951
destinationWhite Collar:countryYugoslavia .
destinationBlue Collar:countryYugoslavia .
destinationWhite Collar:countryDenmark .
destinationBlue Collar:countryDenmark .
destinationWhite Collar:countryFinland .
destinationBlue Collar:countryFinland .
destinationWhite Collar:countryNorway .
destinationBlue Collar:countryNorway .
destinationWhite Collar:countrySweden .
destinationBlue Collar:countrySweden .
originWhite Collar:destinationWhite Collar:countryAustralia .
originBlue Collar:destinationWhite Collar:countryAustralia .
originWhite Collar:destinationBlue Collar:countryAustralia .
originBlue Collar:destinationBlue Collar:countryAustralia 0.16874354
originWhite Collar:destinationWhite Collar:countryBelgium 0.21522315
originBlue Collar:destinationWhite Collar:countryBelgium 0.17705495
originWhite Collar:destinationBlue Collar:countryBelgium .
originBlue Collar:destinationBlue Collar:countryBelgium .
originWhite Collar:destinationWhite Collar:countryFrance 0.42074875
originBlue Collar:destinationWhite Collar:countryFrance 0.24230301
originWhite Collar:destinationBlue Collar:countryFrance 0.91479074
originBlue Collar:destinationBlue Collar:countryFrance 0.39779339
originWhite Collar:destinationWhite Collar:countryHungary .
originBlue Collar:destinationWhite Collar:countryHungary 0.35107067
originWhite Collar:destinationBlue Collar:countryHungary .
originBlue Collar:destinationBlue Collar:countryHungary -0.14851797
originWhite Collar:destinationWhite Collar:countryItaly .
originBlue Collar:destinationWhite Collar:countryItaly .
originWhite Collar:destinationBlue Collar:countryItaly .
originBlue Collar:destinationBlue Collar:countryItaly .
originWhite Collar:destinationWhite Collar:countryJapan .
originBlue Collar:destinationWhite Collar:countryJapan .
originWhite Collar:destinationBlue Collar:countryJapan .
originBlue Collar:destinationBlue Collar:countryJapan -0.12605413
originWhite Collar:destinationWhite Collar:countryPhilippines .
originBlue Collar:destinationWhite Collar:countryPhilippines .
originWhite Collar:destinationBlue Collar:countryPhilippines .
originBlue Collar:destinationBlue Collar:countryPhilippines .
originWhite Collar:destinationWhite Collar:countryUnited States 0.34321015
originBlue Collar:destinationWhite Collar:countryUnited States 1.02222311
originWhite Collar:destinationBlue Collar:countryUnited States 0.18588765
originBlue Collar:destinationBlue Collar:countryUnited States 0.37988355
originWhite Collar:destinationWhite Collar:countryWest Germany 0.62405862
originBlue Collar:destinationWhite Collar:countryWest Germany 0.05653652
originWhite Collar:destinationBlue Collar:countryWest Germany 0.46213902
originBlue Collar:destinationBlue Collar:countryWest Germany .
originWhite Collar:destinationWhite Collar:countryWest Malaysia -0.04383566
originBlue Collar:destinationWhite Collar:countryWest Malaysia .
originWhite Collar:destinationBlue Collar:countryWest Malaysia .
originBlue Collar:destinationBlue Collar:countryWest Malaysia -0.16241454
originWhite Collar:destinationWhite Collar:countryYugoslavia .
originBlue Collar:destinationWhite Collar:countryYugoslavia .
originWhite Collar:destinationBlue Collar:countryYugoslavia .
originBlue Collar:destinationBlue Collar:countryYugoslavia .
originWhite Collar:destinationWhite Collar:countryDenmark .
originBlue Collar:destinationWhite Collar:countryDenmark .
originWhite Collar:destinationBlue Collar:countryDenmark .
originBlue Collar:destinationBlue Collar:countryDenmark .
originWhite Collar:destinationWhite Collar:countryFinland .
originBlue Collar:destinationWhite Collar:countryFinland .
originWhite Collar:destinationBlue Collar:countryFinland .
originBlue Collar:destinationBlue Collar:countryFinland .
originWhite Collar:destinationWhite Collar:countryNorway .
originBlue Collar:destinationWhite Collar:countryNorway .
originWhite Collar:destinationBlue Collar:countryNorway .
originBlue Collar:destinationBlue Collar:countryNorway .
originWhite Collar:destinationWhite Collar:countrySweden .
originBlue Collar:destinationWhite Collar:countrySweden .
originWhite Collar:destinationBlue Collar:countrySweden .
originBlue Collar:destinationBlue Collar:countrySweden .