Playing it simple: a guide to the Probit Model

You don’t have to be a rocket scientist to understand the Probit Model

In layman’s terms, a probit model is simply a popular specification for an ordinal or a binary response model that employs a probit link function. This model is most often estimated using standard maximum likelihood procedure, such an estimation being called a probit regression.

To many of you, this is quite simple, and I apologize if I’m treading familiar ground, but I want to make it as clear as possible for those new to marketing lingo, because it is such an important concept in the industry at the moment. Basic, but important.

To introduce it straightforwardly, suppose response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0. For example Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc.

Got it? Ha ha, “yes, Ted, we realise 1 +1 = 2”. Okay, then let us continue (sorry to bore you with this!)

We also have a vector of regressors X, which are assumed to influence the outcome Y. (duh) Specifically, we assume that the model takes form:

where Pr denotes probability, and Φ is the Cumulative Distribution Function (CDF) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood.

It is also possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable

where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive:

Again, apologies if you feel I’m treating you condescendingly, it’s not my intention. I am merely using the most fundamental concepts of the model to prove a point- i.e. it is that easy.

Okay, now we move on to the Maximum likelihood estimation (groans… “give us something that challenges us Ted!” Again, sorry, but we need to keep it simple so everyone’s on the same footing, school kids might be reading!)

Suppose data set contains n independent statistical units corresponding to the model above. Then their joint log-likelihood function is

The estimator which maximizes this function will be consistent, asymptotically normal and efficient provided that E[XX’] exists and is not singular. It can be shown that this log-likelihood function is globally concave in β, and therefore standard numerical algorithms for optimization will converge rapidly to the unique maximum (…you don’t say!)

Asymptotic distribution for  is given by

 

where

and φ = Φ’ is the Probability Density Function (PDF) of standard normal distribution.

I know, I know, easy stuff. But the next bit is where it gets a bit more challenging (and I do stress a bit, sorry), and that is Berkson’s minimum chi-square method. This method can be applied only when there are many observations of response variable yi having the same value of the vector of regressors xi (such situation may be referred to as “many observations per cell”). More specifically, the model can be formulated as follows.

Suppose among n observations  there are only T distinct values of the regressors, which can be denoted as . Let nt be the number of observations with xi = x(t), and rt the number of observations with xi = x(t) and yi = 1. We assume that there are indeed “many” observations per each “cell”: limit nt÷n → constt>0 as n→∞ and for each group t.

Denote

Then Berkson’s minimum chi-square estimator is a generalized least squares estimator in a regression of

on x(t) with weights :

It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient.Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts (t) (for example in the analysis of voting behavior).

Well, there you have it, the Probit Model. For some of you (or should I say most of you) this would have been child’s play. It really is a pretty basic concept, comparable to the multiplication table in maths, or the Solfège technique in music. But, for those new to marketing it’s a good starting point and a must if you’re really going to understand the basic concepts of marketing.

– Ted Anthony