MP 02: Python Package
Goal
The goal of this MP is to create a Python package that implements various univariate regression methods for machine learning.
Deadlines
- Due: Friday, April 5, 11:59 PM
GitHub Repository
To setup your repository for this MP, use the following link:
Context
The goal of this MP is to create a Python package named mluno that contains classes and functions for performing various univariate regression tasks in a machine learning context. The package will have the ability to:
- Simulate data
- Split data
- Train models
- Predict with trained model
- Predict with intervals (DDG only)
- Calculate model metrics
- Visualize learned models
Obviously, packages already exist to do this things, like sklearn. As such, you are not allowed to use these packages, but instead, code “from scratch” using only numpy and matplotlib.
Your package should be well documented and well tested.
Video Walkthroughs
Aside from the DDG only portion, this MP will mostly consist of following along with in-class tutorials that will also be recorded and posted. Those records will be linked here as they are created.
Directory Structure
Your completed MP should contain only the following directories and files:
./your-github-repo-name/
│
├── src/
│ │
│ └── mluno/
│ ├── __init__.py
│ ├── conformal.py
│ ├── data.py
│ ├── metrics.py
│ ├── plot.py
│ └── regressors.py
│
├── tests/
│ ├── test_conformal.py
│ ├── test_data.py
│ ├── test_metrics.py
│ ├── test_plot.py
│ └── test_regressors.py
│
├── _quarto.yml
├── .gitignore
├── .python-version
├── index.qmd
├── pyproject.toml
└── README.md
Importantly, please use .gitignore to ensure other files are not included, especially your virtual environment. The following should get you started:
__pycache__/
dist/
.venv
.pytest_cache/
.ruff_cache/
/docs/
/reference/
/.quarto/
.DS_Store
objects.json
_sidebar.yml
requirements-dev.lock
requirements.lock
MP Specific Technology
Package Management
Computing and Visualization
Documentation
Testing
Requirements
Your package will have four (or five) modules, contained in the files:
data.pyregressors.pyconformal.py(DDG only)metrics.pyplot.py
The specifications for the modules can be found in the example documentation website:
Documentation
Each function and class must be reasonably well documented. Specifically, you should write docstrings using the numpydoc style that can be used to automatically generate a documentation website using quartodoc and quarto.
You may reference the examples documentation when writing your documentation. Your documentation does not need to be as detailed, but should similiarly document all functions (including their parameters) and classes (including their attributes, methods, and parameters).
Testing
You must setup your package to be tested with pytest. Test should be places into the appropriate files described above. The specific test have been written for you and can be found here:
This file contains all the necessary tests for the package. However, you must place them into the correct files and import the necessary packages and modules to make them run. The functions for conformal prediction should only be including for DDG students who have implemented conformal prediction.
DDG Only Requirements
The files conformal.py and test_conformal.py are for DDG students only. In these files, you will implement and test split conformal prediction (SCP). To learn about SCP, we highly recommend the following slide deck:
Additional Conformal Prediction Information.
- Wikipedia: Conformal Prediction
- Resources: Awesome Conformal Prediction
- Software: Puncc (Predictive uncertainty calibration and conformalization)
- Tutorial: A Conformal Prediction tutorial, an introductive review of the basics
- Notes: Conformal Prediction
- Paper: A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
- Paper: Conformal Prediction: A Basic Overview
Grading Rubric
For simplicity, the grading rubric will consist of many items, each scored one of 0, 1, or 2. These scores will generally take the meaning:
2: Item completed fully and successfully.1: Item completed with minor issues.0: Item incomplete or completed with major issues.
As much as possible, grading of each item will be done independently. However, some items are clearly dependent on others, and often those items are of extra importance. As such, we reserve the right to, when appropriate, allow important items to (negatively) effect the grading of other items.
After cloning your repository, we will grade your package based on the following items:
- Required Package Files:
pyproject.tomlexists. - Required Package Files:
README.mdexists. - Required Package Files:
src/mluno/__init__.pyexists. - Required Package Files:
.gitignoreexists. - Required Package Files:
.python-versionexists. - Required Package Files:
.gitignoreis used correctly and there are no additional files. - Package Management:
rye syncruns without error. - Package Management: Running
rye synccreates.venv/. - Package Management: Created virtual environment contains Python 3.11.
- Package Management: Running
rye synccreatesrequirements-dev.lock. - Package Management: Running
rye synccreatesrequirements.lock. - Package Management:
rye buildruns without error. - Package Management: Running
rye buildcreatesdist/mluno-0.1.0-py3-none-any.whl. - Package Management: Running
rye buildcreatesdist/mluno-0.1.0.tar.gz. - Package Source:
data.pyexists. - Package Source:
metrics.pyexists. - Package Source:
plot.pyexists. - Package Source:
regressors.pyexists. - Package Testing:
test_data.pyexists. - Package Testing:
test_metrics.pyexists. - Package Testing:
test_plot.pyexists. - Package Testing:
test_regressors.pyexists. - Package Testing:
rye testruns without error. - Package Testing:
rye testruns at least 15 tests that pass. - Package Documentation:
index.qmdexists. - Package Documentation:
_quarto.ymlexists. - Package Documentation:
rye run quartodoc buildruns without error. - Package Documentation: Running
rye run quartodoc buildcreatesreferences/. - Package Documentation:
quarto previewruns without error. - Package Documentation: Running
quarto previewcreatesdocs/. - Package Documentation: Created documentation has a landing page.
- Package Documentation: Created documentation has a navigation bar with a link to the API reference.
- Package Documentation: API reference collects all functions and classes in a sidebar.
- Package Documentation: API reference collects all functions and classes on the
reference/page. - Package Documentation: API reference has a section for the
datamodule. - Package Documentation: API reference has a reasonably well documented entry for the
data.make_line_datafunction. - Package Documentation: API reference has a reasonably well documented entry for the
data.make_sine_datafunction. - Package Documentation: API reference has a reasonably well documented entry for the
data.split_datafunction. - Package Documentation: API reference has a section for the
metricsmodule. - Package Documentation: API reference has a reasonably well documented entry for the
metrics.rmsefunction. - Package Documentation: API reference has a reasonably well documented entry for the
metrics.maefunction. - Package Documentation: API reference has a section for the
plotmodule. - Package Documentation: API reference has a reasonably well documented entry for the
plot.plot_predictionsfunction. - Package Documentation: API reference has a section for the
regressorsmodule. - Package Documentation: API reference has a reasonably well documented entry for the
regressors.KNNRegressorclass. - Package Documentation: API reference has a reasonably well documented entry for the
regressors.LinearRegressorclass.
DDG Grading Rubric
- Package Source:
conformal.pyexists. - Package Testing:
test_conformal.pyexists. - Package Testing:
rye testruns 19 tests that pass. - Package Documentation: API reference has a reasonably well documented entry for the
metrics.coveragefunction. - Package Documentation: API reference has a reasonably well documented entry for the
metrics.sharpnessfunction. - Package Documentation: API reference has a section for the
conformalmodule. - Package Documentation: API reference has a reasonably well documented entry for the
conformal.ConformalPredictorclass.
Submission
Two forms of submission are required:
- Push your code to Github.
- This is how we will access your.
- Submit your repository URL to the Canvas assignment named MP 02.
- This is how we will know your code is ready for grading, and will allow us to track late submissions.
- You may only submit to Canvas once. Once you have submitted, we will grade your MP.
- Once you have submitted ot Canvas, you should make no further changes to the code pushed to GitHub.
- Students in the DDG section will make an additional submission on Cavnas to the assignment named MP 02 DDG.
- Failure to submit to the DDG version in addition to the regular version will result in significant point loss.