Clean, Modular Code: overview#

Part 1: 3 strategies#

PEP 8 & consistent code format#

  • Generally accepted rules for code format: PEP 8

  • Set of rules around white space, naming conventions, import orders and more.

  • Code Formatters: Tools to apply PEP 8 as you work!

Notes#

# this code is not PEP8 compliant -- why?
def doStuff(a ,b):print ("Result:",a+b) ; return a+b
x=True
if x: print ( "Messy code.") ; print("Oops!")
Messy code.
Oops!
# This code is PEP8 compliant -- why?
def do_stuff(number1, number2):
    print("Result:", number1 + number2)
    return number1 + number2


x = True
if x:
    print("This is nicer code.")
    print("Yay!")
This is nicer code.
Yay!

(A few) Code format tools#

  • Jupyter code formatter (installed in this binder).

  • Black

  • Ruff

Other tools to consider#

  • pre-commit hooks: use with .git; apply every time you commit changes

  • Setup VSCode (and other IDE’s) to format on save

Expressive code#

  • Code can become documentation when written well.

  • Use variable, function & class names that tell a user what each thing does

Image showing a large see-through Tupperware container with cookies in it but a label that says Basmati Rice.

A large see-through Tupperware container labeled “Basmati Rice” containing cookies.#

# Could this code be more clear?
def do_stuff(a, b):
    print("Result:", a + b)
    return a + b


x = True
if x:
    print("This is less nice code.")
This is less nice code.
def calculate_sum(number1, number2):
    print("Result:", number1 + number2)
    return number1 + number2


is_valid = True
if is_valid:
    print("This is nicer code.")
    print("Yay!")
This is nicer code.
Yay!

DRY code#

  • Don’t Repeat Yourself

  • Use functions, loops and conditionals instead

Image of copy pasta - ctrl v.
# Not DRY
a = 5
b = 10
print(a + 2)
print(b + 2)
print(a * 2)
print(b * 2)
7
12
10
20
# DRY
def process_number(x):
    print(x + 2)
    print(x * 2)


numbers = [5, 10]
for num in numbers:
    process_number(num)
7
10
12
20

Begin Activity One!#

You are now familiar with 3 strategies for writing better, cleaner code. You will apply these principles to example code in the first activity.

Remember that this is not a test! It’s a chance to think about how you write code and how others may receive it!

Part 2: Refactor your code#

Three strategies:

  • Document

  • Modularize

  • Ensure reproducibility: (dynamic paths)

Document as you go#

  • Make it a habit

  • It’s easier to do it as you work!

Document your code#

Add a docstring to the top of any script or module that explains the intent of the code.

A module or function docstring will appear in API documentation if you use tools like autodoc.

"""What this module does"""

import pandas as pd

# Code starts here...
# A sad, lonely, undocumented function
def add_numbers(x, y):
    return x + y


help(add_numbers)
Help on function add_numbers in module __main__:

add_numbers(x, y)
    # A sad, lonely, undocumented function
# Better: Add a docstring!
def add_numbers(x, y):
    """
    Add two numbers.

    Parameters
    ----------
    x : int or float
        The first number.
    y : int or float
        The second number.

    Returns
    -------
    int or float
        The sum of `x` and `y`.
    """
    return x + y
help(add_numbers)
Help on function add_numbers in module __main__:

add_numbers(x, y)
    Add two numbers.
    
    Parameters
    ----------
    x : int or float
        The first number.
    y : int or float
        The second number.
    
    Returns
    -------
    int or float
        The sum of `x` and `y`.
# Best add docstring with examples of running the function
def add_num(x, y):
    """
    Add two numbers.

    Parameters
    ----------
    x : int or float
        The first number.
    y : int or float
        The second number.

    Returns
    -------
    int or float
        The sum of `x` and `y`.

    Examples
    --------
    >>> add_num(2, 3)
    5
    >>> add_num(4.5, 5.5)
    10.0
    """
    return x + y
help(add_num)
Help on function add_num in module __main__:

add_num(x, y)
    Add two numbers.
    
    Parameters
    ----------
    x : int or float
        The first number.
    y : int or float
        The second number.
    
    Returns
    -------
    int or float
        The sum of `x` and `y`.
    
    Examples
    --------
    >>> add_num(2, 3)
    5
    >>> add_num(4.5, 5.5)
    10.0

Modularize your code#

Functions make code easier to read, test, and maintain.

# Not DRY, not modular
a = 5
b = 10
print(a + 2)
print(b + 2)
print(a * 2)
print(b * 2)
7
12
10
20
# DRY & modular
def process_number(x):
    print(x + 2)
    print(x * 2)


numbers = [5, 10]
for num in numbers:
    process_number(num)
7
10
12
20

Make your code reproducibe across machines with dynamic paths#

  • Paths on Windows are different than MAC/Linux

  • Using Path (or os)

# Less good
path = "data/data.json"
path
'data/data.json'
import pathlib

# Dynamically generate paths so they will run on diff operating systems
path = pathlib.Path("") / "data" / "data.json"
print(path)
data/data.json

Tests & checks#

  • Usability sometimes means failing (gracefully and with intention).

3 Strategies#

  • Fail fast (with useful error messages)

  • try/except blocks: handle errors (exceptions)

  • Conditionals to optimize and redirect workflows

Fail fast#

# This code fails with a useful message
import json
from pathlib import Path

import pandas as pd

# Define the file path
file_path = Path("your_file.json")

# Open the JSON file and read the data
with file_path.open("r") as json_file:
    json_data = json.load(json_file)

# Normalize the JSON data into a Pandas DataFrame
df = pd.json_normalize(json_data)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[17], line 11
      8 file_path = Path("your_file.json")
     10 # Open the JSON file and read the data
---> 11 with file_path.open("r") as json_file:
     12     json_data = json.load(json_file)
     14 # Normalize the JSON data into a Pandas DataFrame

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/pathlib.py:1044, in Path.open(self, mode, buffering, encoding, errors, newline)
   1042 if "b" not in mode:
   1043     encoding = io.text_encoding(encoding)
-> 1044 return io.open(self, mode, buffering, encoding, errors, newline)

FileNotFoundError: [Errno 2] No such file or directory: 'your_file.json'

When your code doesn’t fail fast, it might fail somewhere different. In this case, .glob can’t find any files so the list of paths is empty.

In this case, the file can’t be opened but it failes because file_path is an empty list when the real problem is it’s missing fail paths. This type of error is confusing to a user and will take longer to debug.

# The same problem occurs in this code but it fails with a less useful message
import json
from pathlib import Path

import pandas as pd

# Create a list of json files
file_paths = list(Path(".").glob("*.json"))
# Open the first file
with file_paths[0].open("r") as json_file:
    json_data = json.load(json_file)

# Normalize the JSON data into a Pandas DataFrame
df = pd.json_normalize(json_data)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[18], line 10
      8 file_paths = list(Path(".").glob("*.json"))
      9 # Open the first file
---> 10 with file_paths[0].open("r") as json_file:
     11     json_data = json.load(json_file)
     13 # Normalize the JSON data into a Pandas DataFrame

IndexError: list index out of range
# The problem:
file_paths
[]

Handle errors with try/excepts#

  • Anticipate errors a user may encounter when using your code.

  • Redirect workflows by catching errors.

  • Provide helpful error messages

# What happens when the data are in a list rather than provided as a string?
# Here, the code runs and doesn't fail at all producing a potential bug
title = "package name I'm a title"
title.split(":")
["package name I'm a title"]
# In this case the code is provided with an int - resulting in an attribute error
title = 1
package_name = title.split(":")[0]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[21], line 3
      1 # In this case the code is provided with an int - resulting in an attribute error
      2 title = 1
----> 3 package_name = title.split(":")[0]

AttributeError: 'int' object has no attribute 'split'
# This works as expected
title = "package name: i'm a title"
package_name = title.split(":")[0]
package_name 
'package name'

In some cases, you may want to capture the error and return a default value (or do something else).

An illustration showing a basic Python try and except block. Under “try:”, there is a green box with the text “Code runs as planned! Yay!”. Below that, under “except:”, there is a light yellow box with the text “Code fails, Do something else”. The image has a minimalist style with a small flower icon and a wavy line header at the top, and includes the pyOpenSci logo in the bottom right corner.
title = 9999

try:
    package_name = title.split(":")[0]
except AttributeError:
    # Ask yourself, how do you want to handle this exception?
    package_name = None
    # Should the code keep running? Should it exit? Should it assign a default value?
    # raise AttributeError(f"Oops - I expected a string and you provided a {type(title)}")

package_name

In other cases you may want to intentionally raise an error with a custom message.

title = 999

try:
    package_name = title.split(":")[0]
except AttributeError:
    # Ask yourself, how do you want to handle this exception?
    # Should the code keep running? Should it exit? Should it assign a default value?
    raise AttributeError(f"Oops - I expected a string and you provided a {type(title)}")

package_name
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[24], line 4
      3 try:
----> 4     package_name = title.split(":")[0]
      5 except AttributeError:
      6     # Ask yourself, how do you want to handle this exception?
      7     # Should the code keep running? Should it exit? Should it assign a default value?

AttributeError: 'int' object has no attribute 'split'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[24], line 8
      4     package_name = title.split(":")[0]
      5 except AttributeError:
      6     # Ask yourself, how do you want to handle this exception?
      7     # Should the code keep running? Should it exit? Should it assign a default value?
----> 8     raise AttributeError(f"Oops - I expected a string and you provided a {type(title)}")
     10 package_name

AttributeError: Oops - I expected a string and you provided a <class 'int'>

You’re ready for activity three#

Activity 3 is an interactive notebook you can work on in small groups.

Work through the activities and ask questions!