Regression analysis is a conceptually simple method for investigating functional relationships

among variables. A real estate appraiser may wish to relate the sale price

of a home from selected physical characteristics of the building and taxes (local,

school, county) paid on the building. We may wish to examine whether cigarette

consumption is related to various socioeconomic and demographic variables such

as age, education, income, and price of cigarettes. The relationship is expressed in

the form of an equation or a model connecting the response or dependent variable

and one or more explanatory or predictor variables. In the cigarette consumption

example, the response variable is cigarette consumption (measured by the number

of packs of cigarette sold in a given state on a per capita basis during a given year)

and the explanatory or predictor variables are the various socioeconomic and demographic

variables. In the real estate appraisal example, the response variable is

the price of a home and the explanatory or predictor variables are the characteristics

of the building and taxes paid on the building.

We denote the response variable by Y and the set of predictor variables by

X I , Xp, . . ., X,, where p denotes the number of predictor variables. The true

relationship between Y and X I , Xp, . . . , X , can be approximated by the regression


y = f ( X l 1x 21. . . , X,) + € 1

where E is assumed to be a random error representing the discrepancy in the

approximation. It accounts for the failure of the model to fit the data exactly. The

function f(X1X,2 , . . . ,X , ) describes the relationship between Y and XI,X2 ,. . .,

X,. An example is the linear regression model

Y = Po + p1x1 + p2x2 + . . . + ppx, + E l (1 .a

where pol ,& . . . , ,LIP, called the regression parameters or coefficients, are unknown

constants to be determined (estimated) from the data. We follow the commonly

used notational convention of denoting unknown parameters by Greek letters.

The predictor or explanatory variables are also called by other names such as

independent variables, covariates, regressors, factors, and carriers. The name

independent variable, though commonly used, is the least preferred, because in

practice the predictor variables are rarely independent of each other.