Skip to content

Commit 70cf559

Browse files
authored
Merge pull request #14 from rickecon/chaps
Merging
2 parents 2a63243 + a3b3579 commit 70cf559

2 files changed

Lines changed: 31 additions & 7 deletions

File tree

docs/book/_toc.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ parts:
55
chapters:
66
- file: contrib/contributing
77
- caption: Coding in Python
8+
numbered: True
89
chapters:
910
- file: python/intro
1011
- file: python/StandardLibrary
@@ -17,18 +18,23 @@ parts:
1718
- file: python/DocStrings
1819
- file: python/UnitTesting
1920
- caption: Git and GitHub
21+
numbered: True
2022
chapters:
2123
- file: git/intro
2224
- caption: Basic Empirical Methods
25+
numbered: True
2326
chapters:
2427
- file: basic_empirics/BasicEmpirMethods
2528
- caption: Basic Machine Learning
29+
numbered: True
2630
chapters:
2731
- file: basic_ml/ml_intro
2832
- caption: Neural Nets and Deep Learning
33+
numbered: True
2934
chapters:
3035
- file: deep_learn/intro
3136
- caption: Structural Estimation
37+
numbered: True
3238
chapters:
3339
- file: struct_est/intro
3440
- file: struct_est/MaxLikelihood

docs/book/basic_empirics/BasicEmpirMethods.md

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -548,31 +548,49 @@ OLS predicted values for Acemoglu, et al, 2001 data
548548
:label: ExerBasicEmpir_MultLinRegress
549549
:class: green
550550
```
551-
For this problem, you will use the 397 observations from the [`Auto.csv`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/basic_empirics/Auto.csv) dataset in the [`/data/basic_empirics/`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/basic_empirics) folder of the repository for this book.[^Auto] This dataset includes 397 observations on miles per gallon (`mpg`), number of cylinders (`cylinders`), engine displacement (`displacement`), horsepower (`horsepower`), vehicle weight (`weight`), acceleration (`acceleration`), vehicle year (`year`), vehicle origin (`origin`), and vehicle name (`name`).
551+
For this problem, you will use the 397 observations from the [`Auto.csv`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/basic_empirics/Auto.csv) dataset in the [`/data/basic_empirics/`](https://github.com/OpenSourceEcon/CompMethods/tree/main/data/basic_empirics) folder of the repository for this book.[^Auto] This dataset includes 397 observations on the following variables:
552+
* `mpg`: miles per gallon
553+
* `cylinders`: number of cylinders
554+
* `displacement`: engine displacement (cubic inches)
555+
* `horsepower`: engine horsepower
556+
* `weight`: vehicle weight (lbs.)
557+
* `acceleration`: time to accelerate from 0 to 60 mph (sec.)
558+
* `year`: vehicle year
559+
* `origin`: origin of car (1=American, 2=European, 3=Japanese)
560+
* `name`: vehicle name
552561
1. Import the data using the [`pandas.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function. Look for characters that seem out of place that might indicate missing values. Replace them with missing values using the `na_values=...` option.
553-
2. Produce a scatterplot matrix which includes all of the quantitative variables `mpg`, `cylinders`, `displacement`, `horsepower`, `weight`, `acceleration`, `year`, `origin`. Call your DataFrame of quantitative variables `df_quant`. [Use the pandas scatterplot function in the code block below.]
562+
2. Create descriptive statistics for each of the numerical variables (count, mean, standard deviation, min, 25%, 50%, 75%, max). How do you interpret the descriptive statistics on the `origin` variable? What might be a better way to report descriptive statistics for this categorical variable?
563+
3. Produce a scatterplot matrix which includes all of the numerical variables `mpg`, `cylinders`, `displacement`, `horsepower`, `weight`, `acceleration`, `year`, `origin`. Call your DataFrame of numerical variables `df_numer`. [Use the pandas scatterplot function in the code block below.]
554564
```python
555565
from pandas.plotting import scatter_matrix
556566

557-
scatter_matrix(df_quant, alpha=0.3, figsize=(6, 6), diagonal='kde')
567+
scatter_matrix(df_numer, alpha=0.3, figsize=(6, 6), diagonal='kde')
558568
```
559-
3. Compute the correlation matrix for the quantitative variables ($8\times 8$) using the [`pandas.DataFrame.corr()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html) method.
560-
4. Estimate the following multiple linear regression model of $mpg_i$ on all other quantitative variables, where $u_i$ is an error term for each observation, using Python's `statsmodels.api.OLS()` function.
569+
4. Compute the correlation matrix for the numerical variables ($8\times 8$) using the [`pandas.DataFrame.corr()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html) method.
570+
5. What is wrong with estimating the following linear regression model? How would you fix this problem? (Hint: There is an issue with one of the variables)
561571
\begin{equation*}
562572
\begin{split}
563573
mpg_i &= \beta_0 + \beta_1 cylinders_i + \beta_2 displacement_i + \beta_3 horsepower_i + ... \\
564574
&\qquad \beta_4 weight_i + \beta_5 acceleration_i + \beta_6 year_i + \beta_7 origin_i + u_i
565575
\end{split}
566576
\end{equation*}
577+
6. Estimate the following multiple linear regression model of $mpg_i$ on all other numerical variables, where $u_i$ is an error term for each observation, using Python's `statsmodels.api.OLS()` function, with indicator variables created for two out of the three `origin` categories (2=European, 3=Japanese).
578+
\begin{equation*}
579+
\begin{split}
580+
mpg_i &= \beta_0 + \beta_1 cylinders_i + \beta_2 displacement_i + \beta_3 horsepower_i + ... \\
581+
&\qquad \beta_4 weight_i + \beta_5 acceleration_i + \beta_6 year_i + ...\\
582+
&\qquad \beta_7 european_i + \beta_8 japanese_i + u_i
583+
\end{split}
584+
\end{equation*}
567585
* Which of the coefficients is statistically significant at the 1\% level?
568586
* Which of the coefficients is NOT statistically significant at the 10\% level?
569587
* Give an interpretation in words of the estimated coefficient $\hat{\beta}_6$ on $year_i$ using the estimated value of $\hat{\beta}_6$.
570-
5. Looking at your scatterplot matrix from part (2), what are the three variables that look most likely to have a nonlinear relationship with $mpg_i$?
588+
7. Looking at your scatterplot matrix from part (2), what are the three variables that look most likely to have a nonlinear relationship with $mpg_i$?
571589
* Estimate a new multiple regression model by OLS in which you include squared terms on the three variables you identified as having a nonlinear relationship to $mpg_i$ as well as a squared term on $acceleration_i$.
572590
* Report your adjusted R-squared statistic. Is it better or worse than the adjusted R-squared from part (4)?
573591
* What happened to the statistical significance of the $displacement_i$ variable coefficient and the coefficient on its squared term?
574592
* What happened to the statistical significance of the cylinders variable?
575-
6. Using the regression model from part (5) and the `.predict()` function, what would be the predicted miles per gallon $mpg$ of a car with 6 cylinders, displacement of 200, horsepower of 100, a weight of 3,100, acceleration of 15.1, model year of 1999, and origin of 1?
593+
8. Using the regression model from part (6) and the `.predict()` function, what would be the predicted miles per gallon $mpg$ of a car with 6 cylinders, displacement of 200, horsepower of 100, a weight of 3,100, acceleration of 15.1, model year of 1999, and origin of 1 (American)?
576594
```{exercise-end}
577595
```
578596

0 commit comments

Comments
 (0)