MP 02: Python Package
Goal
The goal of this MP is to create a Python package that implements various univariate regression methods for machine learning.
Deadlines
- Due: Friday, April 5, 11:59 PM
GitHub Repository
To setup your repository for this MP, use the following link:
Context
The goal of this MP is to create a Python package named mluno
that contains classes and functions for performing various univariate regression tasks in a machine learning context. The package will have the ability to:
- Simulate data
- Split data
- Train models
- Predict with trained model
- Predict with intervals (DDG only)
- Calculate model metrics
- Visualize learned models
Obviously, packages already exist to do this things, like sklearn
. As such, you are not allowed to use these packages, but instead, code “from scratch” using only numpy
and matplotlib
.
Your package should be well documented and well tested.
Video Walkthroughs
Aside from the DDG only portion, this MP will mostly consist of following along with in-class tutorials that will also be recorded and posted. Those records will be linked here as they are created.
Directory Structure
Your completed MP should contain only the following directories and files:
./your-github-repo-name/
│
├── src/
│ │
│ └── mluno/
│ ├── __init__.py
│ ├── conformal.py
│ ├── data.py
│ ├── metrics.py
│ ├── plot.py
│ └── regressors.py
│
├── tests/
│ ├── test_conformal.py
│ ├── test_data.py
│ ├── test_metrics.py
│ ├── test_plot.py
│ └── test_regressors.py
│
├── _quarto.yml
├── .gitignore
├── .python-version
├── index.qmd
├── pyproject.toml
└── README.md
Importantly, please use .gitignore
to ensure other files are not included, especially your virtual environment. The following should get you started:
__pycache__/
dist/
.venv
.pytest_cache/
.ruff_cache/
/docs/
/reference/
/.quarto/
.DS_Store
objects.json
_sidebar.yml
requirements-dev.lock
requirements.lock
MP Specific Technology
Package Management
Computing and Visualization
Documentation
Testing
Requirements
Your package will have four (or five) modules, contained in the files:
data.py
regressors.py
conformal.py
(DDG only)metrics.py
plot.py
The specifications for the modules can be found in the example documentation website:
Documentation
Each function and class must be reasonably well documented. Specifically, you should write docstrings using the numpydoc
style that can be used to automatically generate a documentation website using quartodoc
and quarto
.
You may reference the examples documentation when writing your documentation. Your documentation does not need to be as detailed, but should similiarly document all functions (including their parameters) and classes (including their attributes, methods, and parameters).
Testing
You must setup your package to be tested with pytest
. Test should be places into the appropriate files described above. The specific test have been written for you and can be found here:
This file contains all the necessary tests for the package. However, you must place them into the correct files and import the necessary packages and modules to make them run. The functions for conformal prediction should only be including for DDG students who have implemented conformal prediction.
DDG Only Requirements
The files conformal.py
and test_conformal.py
are for DDG students only. In these files, you will implement and test split conformal prediction (SCP). To learn about SCP, we highly recommend the following slide deck:
Additional Conformal Prediction Information.
- Wikipedia: Conformal Prediction
- Resources: Awesome Conformal Prediction
- Software: Puncc (Predictive uncertainty calibration and conformalization)
- Tutorial: A Conformal Prediction tutorial, an introductive review of the basics
- Notes: Conformal Prediction
- Paper: A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
- Paper: Conformal Prediction: A Basic Overview
Grading Rubric
For simplicity, the grading rubric will consist of many items, each scored one of 0
, 1
, or 2
. These scores will generally take the meaning:
2
: Item completed fully and successfully.1
: Item completed with minor issues.0
: Item incomplete or completed with major issues.
As much as possible, grading of each item will be done independently. However, some items are clearly dependent on others, and often those items are of extra importance. As such, we reserve the right to, when appropriate, allow important items to (negatively) effect the grading of other items.
After cloning your repository, we will grade your package based on the following items:
- Required Package Files:
pyproject.toml
exists. - Required Package Files:
README.md
exists. - Required Package Files:
src/mluno/__init__.py
exists. - Required Package Files:
.gitignore
exists. - Required Package Files:
.python-version
exists. - Required Package Files:
.gitignore
is used correctly and there are no additional files. - Package Management:
rye sync
runs without error. - Package Management: Running
rye sync
creates.venv/
. - Package Management: Created virtual environment contains Python 3.11.
- Package Management: Running
rye sync
createsrequirements-dev.lock
. - Package Management: Running
rye sync
createsrequirements.lock
. - Package Management:
rye build
runs without error. - Package Management: Running
rye build
createsdist/mluno-0.1.0-py3-none-any.whl
. - Package Management: Running
rye build
createsdist/mluno-0.1.0.tar.gz
. - Package Source:
data.py
exists. - Package Source:
metrics.py
exists. - Package Source:
plot.py
exists. - Package Source:
regressors.py
exists. - Package Testing:
test_data.py
exists. - Package Testing:
test_metrics.py
exists. - Package Testing:
test_plot.py
exists. - Package Testing:
test_regressors.py
exists. - Package Testing:
rye test
runs without error. - Package Testing:
rye test
runs at least 15 tests that pass. - Package Documentation:
index.qmd
exists. - Package Documentation:
_quarto.yml
exists. - Package Documentation:
rye run quartodoc build
runs without error. - Package Documentation: Running
rye run quartodoc build
createsreferences/
. - Package Documentation:
quarto preview
runs without error. - Package Documentation: Running
quarto preview
createsdocs/
. - Package Documentation: Created documentation has a landing page.
- Package Documentation: Created documentation has a navigation bar with a link to the API reference.
- Package Documentation: API reference collects all functions and classes in a sidebar.
- Package Documentation: API reference collects all functions and classes on the
reference/
page. - Package Documentation: API reference has a section for the
data
module. - Package Documentation: API reference has a reasonably well documented entry for the
data.make_line_data
function. - Package Documentation: API reference has a reasonably well documented entry for the
data.make_sine_data
function. - Package Documentation: API reference has a reasonably well documented entry for the
data.split_data
function. - Package Documentation: API reference has a section for the
metrics
module. - Package Documentation: API reference has a reasonably well documented entry for the
metrics.rmse
function. - Package Documentation: API reference has a reasonably well documented entry for the
metrics.mae
function. - Package Documentation: API reference has a section for the
plot
module. - Package Documentation: API reference has a reasonably well documented entry for the
plot.plot_predictions
function. - Package Documentation: API reference has a section for the
regressors
module. - Package Documentation: API reference has a reasonably well documented entry for the
regressors.KNNRegressor
class. - Package Documentation: API reference has a reasonably well documented entry for the
regressors.LinearRegressor
class.
DDG Grading Rubric
- Package Source:
conformal.py
exists. - Package Testing:
test_conformal.py
exists. - Package Testing:
rye test
runs 19 tests that pass. - Package Documentation: API reference has a reasonably well documented entry for the
metrics.coverage
function. - Package Documentation: API reference has a reasonably well documented entry for the
metrics.sharpness
function. - Package Documentation: API reference has a section for the
conformal
module. - Package Documentation: API reference has a reasonably well documented entry for the
conformal.ConformalPredictor
class.
Submission
Two forms of submission are required:
- Push your code to Github.
- This is how we will access your.
- Submit your repository URL to the Canvas assignment named MP 02.
- This is how we will know your code is ready for grading, and will allow us to track late submissions.
- You may only submit to Canvas once. Once you have submitted, we will grade your MP.
- Once you have submitted ot Canvas, you should make no further changes to the code pushed to GitHub.
- Students in the DDG section will make an additional submission on Cavnas to the assignment named MP 02 DDG.
- Failure to submit to the DDG version in addition to the regular version will result in significant point loss.