MP 02: Python Package

Be aware of the general Machine Problem Policy document.

Goal

The goal of this MP is to create a Python package that implements various univariate regression methods for machine learning.

Deadlines

  • Due: Friday, April 5, 11:59 PM

GitHub Repository

To setup your repository for this MP, use the following link:

Context

The goal of this MP is to create a Python package named mluno that contains classes and functions for performing various univariate regression tasks in a machine learning context. The package will have the ability to:

  • Simulate data
  • Split data
  • Train models
  • Predict with trained model
    • Predict with intervals (DDG only)
  • Calculate model metrics
  • Visualize learned models

Obviously, packages already exist to do this things, like sklearn. As such, you are not allowed to use these packages, but instead, code “from scratch” using only numpy and matplotlib.

Your package should be well documented and well tested.

Video Walkthroughs

Aside from the DDG only portion, this MP will mostly consist of following along with in-class tutorials that will also be recorded and posted. Those records will be linked here as they are created.

Directory Structure

Your completed MP should contain only the following directories and files:

./your-github-repo-name/
│
├── src/
│   │
│   └── mluno/
│       ├── __init__.py
│       ├── conformal.py
│       ├── data.py
│       ├── metrics.py
│       ├── plot.py
│       └── regressors.py
│
├── tests/
│   ├── test_conformal.py
│   ├── test_data.py
│   ├── test_metrics.py
│   ├── test_plot.py
│   └── test_regressors.py
│
├── _quarto.yml
├── .gitignore
├── .python-version
├── index.qmd
├── pyproject.toml
└── README.md

Importantly, please use .gitignore to ensure other files are not included, especially your virtual environment. The following should get you started:

__pycache__/
dist/
.venv
.pytest_cache/
.ruff_cache/
/docs/
/reference/
/.quarto/
.DS_Store
objects.json
_sidebar.yml
requirements-dev.lock
requirements.lock

MP Specific Technology

Package Management

Computing and Visualization

Documentation

Testing

Requirements

Your package will have four (or five) modules, contained in the files:

  • data.py
  • regressors.py
  • conformal.py (DDG only)
  • metrics.py
  • plot.py

The specifications for the modules can be found in the example documentation website:

Documentation

Each function and class must be reasonably well documented. Specifically, you should write docstrings using the numpydoc style that can be used to automatically generate a documentation website using quartodoc and quarto.

You may reference the examples documentation when writing your documentation. Your documentation does not need to be as detailed, but should similiarly document all functions (including their parameters) and classes (including their attributes, methods, and parameters).

Testing

You must setup your package to be tested with pytest. Test should be places into the appropriate files described above. The specific test have been written for you and can be found here:

This file contains all the necessary tests for the package. However, you must place them into the correct files and import the necessary packages and modules to make them run. The functions for conformal prediction should only be including for DDG students who have implemented conformal prediction.

DDG Only Requirements

The files conformal.py and test_conformal.py are for DDG students only. In these files, you will implement and test split conformal prediction (SCP). To learn about SCP, we highly recommend the following slide deck:

Additional Conformal Prediction Information.

Grading Rubric

For simplicity, the grading rubric will consist of many items, each scored one of 0, 1, or 2. These scores will generally take the meaning:

  • 2: Item completed fully and successfully.
  • 1: Item completed with minor issues.
  • 0: Item incomplete or completed with major issues.

As much as possible, grading of each item will be done independently. However, some items are clearly dependent on others, and often those items are of extra importance. As such, we reserve the right to, when appropriate, allow important items to (negatively) effect the grading of other items.

After cloning your repository, we will grade your package based on the following items:

  • Required Package Files: pyproject.toml exists.
  • Required Package Files: README.md exists.
  • Required Package Files: src/mluno/__init__.py exists.
  • Required Package Files: .gitignore exists.
  • Required Package Files: .python-version exists.
  • Required Package Files: .gitignore is used correctly and there are no additional files.
  • Package Management: rye sync runs without error.
  • Package Management: Running rye sync creates .venv/.
  • Package Management: Created virtual environment contains Python 3.11.
  • Package Management: Running rye sync creates requirements-dev.lock.
  • Package Management: Running rye sync creates requirements.lock.
  • Package Management: rye build runs without error.
  • Package Management: Running rye build creates dist/mluno-0.1.0-py3-none-any.whl.
  • Package Management: Running rye build creates dist/mluno-0.1.0.tar.gz.
  • Package Source: data.py exists.
  • Package Source: metrics.py exists.
  • Package Source: plot.py exists.
  • Package Source: regressors.py exists.
  • Package Testing: test_data.py exists.
  • Package Testing: test_metrics.py exists.
  • Package Testing: test_plot.py exists.
  • Package Testing: test_regressors.py exists.
  • Package Testing: rye test runs without error.
  • Package Testing: rye test runs at least 15 tests that pass.
  • Package Documentation: index.qmd exists.
  • Package Documentation: _quarto.yml exists.
  • Package Documentation: rye run quartodoc build runs without error.
  • Package Documentation: Running rye run quartodoc build creates references/.
  • Package Documentation: quarto preview runs without error.
  • Package Documentation: Running quarto preview creates docs/.
  • Package Documentation: Created documentation has a landing page.
  • Package Documentation: Created documentation has a navigation bar with a link to the API reference.
  • Package Documentation: API reference collects all functions and classes in a sidebar.
  • Package Documentation: API reference collects all functions and classes on the reference/ page.
  • Package Documentation: API reference has a section for the data module.
  • Package Documentation: API reference has a reasonably well documented entry for the data.make_line_data function.
  • Package Documentation: API reference has a reasonably well documented entry for the data.make_sine_data function.
  • Package Documentation: API reference has a reasonably well documented entry for the data.split_data function.
  • Package Documentation: API reference has a section for the metrics module.
  • Package Documentation: API reference has a reasonably well documented entry for the metrics.rmse function.
  • Package Documentation: API reference has a reasonably well documented entry for the metrics.mae function.
  • Package Documentation: API reference has a section for the plot module.
  • Package Documentation: API reference has a reasonably well documented entry for the plot.plot_predictions function.
  • Package Documentation: API reference has a section for the regressors module.
  • Package Documentation: API reference has a reasonably well documented entry for the regressors.KNNRegressor class.
  • Package Documentation: API reference has a reasonably well documented entry for the regressors.LinearRegressor class.

DDG Grading Rubric

  • Package Source: conformal.py exists.
  • Package Testing: test_conformal.py exists.
  • Package Testing: rye test runs 19 tests that pass.
  • Package Documentation: API reference has a reasonably well documented entry for the metrics.coverage function.
  • Package Documentation: API reference has a reasonably well documented entry for the metrics.sharpness function.
  • Package Documentation: API reference has a section for the conformal module.
  • Package Documentation: API reference has a reasonably well documented entry for the conformal.ConformalPredictor class.

Submission

Two forms of submission are required:

  • Push your code to Github.
    • This is how we will access your.
  • Submit your repository URL to the Canvas assignment named MP 02.
    • This is how we will know your code is ready for grading, and will allow us to track late submissions.
    • You may only submit to Canvas once. Once you have submitted, we will grade your MP.
      • Once you have submitted ot Canvas, you should make no further changes to the code pushed to GitHub.
    • Students in the DDG section will make an additional submission on Cavnas to the assignment named MP 02 DDG.
      • Failure to submit to the DDG version in addition to the regular version will result in significant point loss.