# Pre-regression Diagnostics: Perform a descriptive analysis of the dependent variable

General Instructions:

Each person is required to follow the same general process for model building and analysis but

applied to their own dataset and model assigned based on the last digit of your student ID

number. The details about individual dataset are provided on pages below. Datasets will be

provided from the Wooldridge datasets and posted on Moodle accompanying this document.

In general, your task is to produce an empirical econometric analysis in either Gretl or Stata.

The general process must be guided as follows:

1. Pre-regression Diagnostics: Perform a descriptive analysis of the dependent variable.

• Describe the dataset, variables used, their datatypes and units of measurement.

• Plot a histogram of the dependent variable describe the distribution.

• Report and discuss descriptive statistics.

• Produce a correlation matrix and discuss results.

2. Regression: Perform a regression analysis of the model given. Give an economic

interpretation of the estimated coefficients.

• Generally describe the goodness of fit of the model (R2

and F-stat), and comment

on the statistical significance of each regressor.

• Give an economic interpretation of the estimated coefficients (only interpret those

which are significantly different from zero at most at a 5% level).

• Illustrate your regression model on a graph.

• Provide predictions specified by your dataset. Interpret your result.

3. Hypothesis Testing: Test the specified linear restriction (individual for every dataset):

• State the null and the alternative hypotheses.

• Compute the value of the test statistic.

• Find the critical value.

• State the rejection rule.

• Give an economic interpretation of the test decision

4. Post-regression Diagnostics: Test for Multicollinearity and Heteroskedasticity

• Test model for Multicollinearity. Present and discuss the results and potential

impact on the model.

• Conduct a Breusch-Pagan test, a White test, and the special case of the White test

using F and LM forms of the appropriate test statistics. Use results to complete

the following table.

Test Observed Test

Statistic

Critical Value for

5% Test

p-value Reject (Y/N)

F LM F LM F LM F LM

BP

White

Special

White

• Is there evidence for heteroskedasticity in your model?

• Estimate robust standard errors for the OLS coefficients. Does it change the

previous inferences? Discuss

Submission instructions:

• Submit a PDF file with necessary graphs, tables and interpretations of your

findings.

• Utilise the submission format attached alongside this document.

• Make sure you are familiar with the University’s rules and regulations regarding

plagiarism (review course outline).

Individual Datasets

STUDENT ID NUMBERS ENDING WITH 0, 1, 2, or 3

• ID Example: 123450, 123451, 123452 or 123453.

• Use the Wooldridge dataset affairs to analyse the determinants of extramarital affairs.

• Find further details on your dataset:

https://www.rdocumentation.org/packages/wooldridge/versions/1.4-2

• Follow the task step details from page 1 along with the following:

1. Perform a descriptive analysis of the dependent variable of interest.

2. Estimate the model and give an economic interpretation of the estimated coefficients.

naffairs = 0 + 1yrsmarr + 2age + 3male + 4kids + 5relig + 6ratemarr + u

Compute averages of each explanatory variables included in the model. According to

the estimated model, how many extramarital affairs does an individual with average

characteristics is predicted to have? Report results for males and females separately.

3. Test whether religiousness (relig) and marriage satisfaction (ratemarr) have no effect

on the expected number of extramarital affairs.

4. Test whether there is a difference in extramarital affairs between males and females.

5. Test for Multicollinearity and Heteroskedasticity as described in general instructions.

STUDENT ID NUMBERS ENDING WITH 4 or 5

• ID Example: 123454 or 123455.

• Use the Wooldridge dataset campus to analyse the elasticity of crimes on campus.

• Find further details on your dataset:

https://www.rdocumentation.org/packages/wooldridge/versions/1.4-2

• Follow the task step details from page 1 along with the following:

1. Perform a descriptive analysis of the dependent variable of interest crime.

2. Estimate the model and give an economic interpretation of the estimated coefficients.

log(crime) = 0 + 1log(enroll) + 2priv + 3priv ∙ log(enroll) + u

Compute average enrolment in the dataset. According to the estimated model, what is

the expected number of crimes on a campus with average enrolment? Report results

for private and public colleges separately.

3. Test the following:

a. Are crimes elastic with respect to enrolment? (Hint 1> 1)

b. Test whether elasticity of crime with respect to enroll is the same in private and

public colleges.

4. Test for Multicollinearity and Heteroskedasticity as described in general instructions.

STUDENT ID NUMBERS ENDING WITH 6 or 7

• ID Example: 123456 or 123457.

• Use the Wooldridge dataset hprice2 to analyse the effect of pollution on house prices.

• Find further details on your dataset:

https://www.rdocumentation.org/packages/wooldridge/versions/1.4-2

• Follow the task step details from page 1 along with the following:

1. Perform a descriptive analysis of the dependent variable of interest price.

2. Estimate the model and give an economic interpretation of the estimated coefficients.

What can you say about house price elasticity with respect to property tax proptax and

pollution level nox?

log(price) = 0 + 1log(nox) + 2log(protax) + 3rooms + 4crime + u

Compute averages of explanatory variables included in the model. According to the

estimated model, what is the expected price of a house with average characteristics?

3. Test whether crime has any effect on house prices.

4. Test whether elasticity of house prices with respect to property tax is the same as the

elasticity of house prices with respect to pollution level.

5. Test for Multicollinearity and Heteroskedasticity as described in general instructions.

STUDENT ID NUMBERS ENDING WITH 8 or 9

• ID Example: 123458 or 123459.

• Use the Wooldridge dataset BWGHT to analyse the factors that affect the weight of

newborn babies.

• See the description of the dataset in statistical package, or find the same details

online: https://www.rdocumentation.org/packages/wooldridge/versions/1.4-2

• Follow the task step details from page 1 along with the following:

1. Perform a descriptive analysis of the dependent variable of interest bwght.

2. Estimate the model and give an economic interpretation of the estimated coefficients.

Construct a variable for the square of the family income.

bwght = 0 + 1cigs + 2faminc + 3famincsq + 4parity + 5male + u

Compute averages of explanatory variables included in the model. According to the

estimated model, what is the expected price of a house with average characteristics?

Calculate the turning point.

3. Test whether 4

and 5

are equal to zero.

4. Test whether cigs have any effect on bwght.

5. Test for Multicollinearity and Heteroskedasticity as described in general instructions.