Clean, Modular Code: overview#
Part 1: 3 strategies#
PEP 8 & consistent code format#
Generally accepted rules for code format: PEP 8
Set of rules around white space, naming conventions, import orders and more.
Code Formatters: Tools to apply PEP 8 as you work!
Notes#
Messy code.
Oops!
This is nicer code.
Yay!
(A few) Code format tools#
Jupyter code formatter (installed in this binder).
Black
Ruff
Other tools to consider#
pre-commit
hooks: use with .git; apply every time you commit changesSetup VSCode (and other IDE’s) to format on save
Expressive code#
Code can become documentation when written well.
Use variable, function & class names that tell a user what each thing does
This is less nice code.
DRY code#
Don’t Repeat Yourself
Use functions, loops and conditionals instead
Begin Activity One!#
You are now familiar with 3 strategies for writing better, cleaner code. You will apply these principles to example code in the first activity.
Remember that this is not a test! It’s a chance to think about how you write code and how others may receive it!
Part 2: Refactor your code#
Three strategies:
Document
Modularize
Ensure reproducibility: (dynamic paths)
Document as you go#
Make it a habit
It’s easier to do it as you work!
Document your code#
Add a docstring to the top of any script or module that explains the intent of the code.
A module or function docstring will appear in API documentation if you use tools like autodoc.
"""What this module does"""
import pandas as pd
# Code starts here...
# A sad, lonely, undocumented function
def add_numbers(x, y):
return x + y
help(add_numbers)
Help on function add_numbers in module __main__:
add_numbers(x, y)
# A sad, lonely, undocumented function
# Better: Add a docstring!
def add_numbers(x, y):
"""
Add two numbers.
Parameters
----------
x : int or float
The first number.
y : int or float
The second number.
Returns
-------
int or float
The sum of `x` and `y`.
"""
return x + y
help(add_numbers)
Help on function add_numbers in module __main__:
add_numbers(x, y)
Add two numbers.
Parameters
----------
x : int or float
The first number.
y : int or float
The second number.
Returns
-------
int or float
The sum of `x` and `y`.
# Best add docstring with examples of running the function
def add_num(x, y):
"""
Add two numbers.
Parameters
----------
x : int or float
The first number.
y : int or float
The second number.
Returns
-------
int or float
The sum of `x` and `y`.
Examples
--------
>>> add_num(2, 3)
5
>>> add_num(4.5, 5.5)
10.0
"""
return x + y
help(add_num)
Help on function add_num in module __main__:
add_num(x, y)
Add two numbers.
Parameters
----------
x : int or float
The first number.
y : int or float
The second number.
Returns
-------
int or float
The sum of `x` and `y`.
Examples
--------
>>> add_num(2, 3)
5
>>> add_num(4.5, 5.5)
10.0
Modularize your code#
Functions make code easier to read, test, and maintain.
Make your code reproducibe across machines with dynamic paths#
Paths on Windows are different than MAC/Linux
Using
Path
(oros
)
# Less good
path = "data/data.json"
path
'data/data.json'
import pathlib
# Dynamically generate paths so they will run on diff operating systems
path = pathlib.Path("") / "data" / "data.json"
print(path)
data/data.json
Tests & checks#
Usability sometimes means failing (gracefully and with intention).
3 Strategies#
Fail fast (with useful error messages)
try/except blocks: handle errors (exceptions)
Conditionals to optimize and redirect workflows
Fail fast#
# This code fails with a useful message
import json
from pathlib import Path
import pandas as pd
# Define the file path
file_path = Path("your_file.json")
# Open the JSON file and read the data
with file_path.open("r") as json_file:
json_data = json.load(json_file)
# Normalize the JSON data into a Pandas DataFrame
df = pd.json_normalize(json_data)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[17], line 11
8 file_path = Path("your_file.json")
10 # Open the JSON file and read the data
---> 11 with file_path.open("r") as json_file:
12 json_data = json.load(json_file)
14 # Normalize the JSON data into a Pandas DataFrame
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/pathlib.py:1044, in Path.open(self, mode, buffering, encoding, errors, newline)
1042 if "b" not in mode:
1043 encoding = io.text_encoding(encoding)
-> 1044 return io.open(self, mode, buffering, encoding, errors, newline)
FileNotFoundError: [Errno 2] No such file or directory: 'your_file.json'
When your code doesn’t fail fast, it might fail somewhere different.
In this case, .glob
can’t find any files so the list of paths is empty.
In this case, the file can’t be opened but it failes because file_path is an empty list when the real problem is it’s missing fail paths. This type of error is confusing to a user and will take longer to debug.
# The same problem occurs in this code but it fails with a less useful message
import json
from pathlib import Path
import pandas as pd
# Create a list of json files
file_paths = list(Path(".").glob("*.json"))
# Open the first file
with file_paths[0].open("r") as json_file:
json_data = json.load(json_file)
# Normalize the JSON data into a Pandas DataFrame
df = pd.json_normalize(json_data)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[18], line 10
8 file_paths = list(Path(".").glob("*.json"))
9 # Open the first file
---> 10 with file_paths[0].open("r") as json_file:
11 json_data = json.load(json_file)
13 # Normalize the JSON data into a Pandas DataFrame
IndexError: list index out of range
# The problem:
file_paths
[]
Handle errors with try/excepts#
Anticipate errors a user may encounter when using your code.
Redirect workflows by catching errors.
Provide helpful error messages
# What happens when the data are in a list rather than provided as a string?
# Here, the code runs and doesn't fail at all producing a potential bug
title = "package name I'm a title"
title.split(":")
["package name I'm a title"]
# In this case the code is provided with an int - resulting in an attribute error
title = 1
package_name = title.split(":")[0]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[21], line 3
1 # In this case the code is provided with an int - resulting in an attribute error
2 title = 1
----> 3 package_name = title.split(":")[0]
AttributeError: 'int' object has no attribute 'split'
# This works as expected
title = "package name: i'm a title"
package_name = title.split(":")[0]
package_name
'package name'
In some cases, you may want to capture the error and return a default value (or do something else).
title = 9999
try:
package_name = title.split(":")[0]
except AttributeError:
# Ask yourself, how do you want to handle this exception?
package_name = None
# Should the code keep running? Should it exit? Should it assign a default value?
# raise AttributeError(f"Oops - I expected a string and you provided a {type(title)}")
package_name
In other cases you may want to intentionally raise an error with a custom message.
title = 999
try:
package_name = title.split(":")[0]
except AttributeError:
# Ask yourself, how do you want to handle this exception?
# Should the code keep running? Should it exit? Should it assign a default value?
raise AttributeError(f"Oops - I expected a string and you provided a {type(title)}")
package_name
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[24], line 4
3 try:
----> 4 package_name = title.split(":")[0]
5 except AttributeError:
6 # Ask yourself, how do you want to handle this exception?
7 # Should the code keep running? Should it exit? Should it assign a default value?
AttributeError: 'int' object has no attribute 'split'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
Cell In[24], line 8
4 package_name = title.split(":")[0]
5 except AttributeError:
6 # Ask yourself, how do you want to handle this exception?
7 # Should the code keep running? Should it exit? Should it assign a default value?
----> 8 raise AttributeError(f"Oops - I expected a string and you provided a {type(title)}")
10 package_name
AttributeError: Oops - I expected a string and you provided a <class 'int'>
You’re ready for activity three#
Activity 3 is an interactive notebook you can work on in small groups.
Work through the activities and ask questions!