Write Flexible Functions to Handle Messy Data#

When working with messy or unpredictable data, your goal is to write robust code to handle the unexpected. It’s important to catch errors early and handle them gracefully. Using functions is a great first step in creating a maintainable and resilient data processing workflow. Functions provide modular units that can be tested independently, making handling edge cases and unexpected scenarios easier.

Adding checks to your functions is the next step towards making your code more robust and maintainable over time.

This lesson covers several strategies for making your code more robust and easier-to-use:

  1. Use try/except blocks rather than simply allowing errors to occur.

  2. Make checks Pythonic

  3. Fail fast

The same function below assumes that the user provides a list that contains a title as a string. Notice that the output of this function is different from what you might expect. How could you fix this?

# This function runs when provided both a list or a string
def clean_title(title):
    """This function always returns the first element"""
    return title[0]

print(clean_title(["hi, i'm a title"]))
print(clean_title("hi, i'm a title"))
hi, i'm a title
h

The function below uses conditional statements to check the input provided by the user. It uses a “look before you leap” approach to check to see if the input is provided in a list format. If it isn’t, it returns the title in its provided format.

def clean_title(title):
    """This function checks explicitly to see if it is provided with a value that is a list. It then 
    makes a decision about how to process the function input based on 
    it's type.  
    """
    if isinstance(title, list):
        return title[0]
    return title


print(clean_title(["hi, i'm a title"]))
print(clean_title("hi, i'm a title"))
print(clean_title(""))
hi, i'm a title
hi, i'm a title

The function below uses a try/except block to handle the title. But again notice that in some cases the code will continue to run, but still returns an unexpected value that will surprise a user.

def clean_title(title):
    """
    This function attempts to return the first character of the title.
    Raises the same error with a friendly, custom error message if the input is invalid.
    """
    try:
        return title[0]
    except IndexError as e:
        raise IndexError(f"Oops! You provided a title in an unexpected format. "
                         f"I expected the title to be provided in a list and you provided "
                         f"a {type(title)}.") from e

print(clean_title(["hi, i am a title"]))
print(clean_title("hi, i am a title"))
print(clean_title(""))  
hi, i am a title
h
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[3], line 7, in clean_title(title)
      6 try:
----> 7     return title[0]
      8 except IndexError as e:

IndexError: string index out of range

The above exception was the direct cause of the following exception:

IndexError                                Traceback (most recent call last)
Cell In[3], line 15
     13 print(clean_title(["hi, i am a title"]))
     14 print(clean_title("hi, i am a title"))
---> 15 print(clean_title(""))  

Cell In[3], line 9, in clean_title(title)
      7     return title[0]
      8 except IndexError as e:
----> 9     raise IndexError(f"Oops! You provided a title in an unexpected format. "
     10                      f"I expected the title to be provided in a list and you provided "
     11                      f"a {type(title)}.") from e

IndexError: Oops! You provided a title in an unexpected format. I expected the title to be provided in a list and you provided a <class 'str'>.

Tip

Informative Errors

Notice we included the value that caused the error in the IndexError message, and a suggestion about what we expected!

Include enough detail in your exceptions so that the person reading them knows why the error occurred, and ideally what they should do about it.

Use Try/Except blocks#

try/except blocks in Python help you handle errors gracefully instead of letting your program crash. They are used when you think a part of your code might fail, like when working with missing data, or when converting data types.

Here’s how try/except blocks work:

  • try block: You write the code that might cause an error here. Python will attempt to run this code.

  • except block: If Python encounters an error in the try block, it jumps to the except block to handle it. You can specify what to do when an error occurs, such as printing a friendly message or providing a fallback option.

A try/except block looks like this[1]:

try:
    # code that might cause an error
except SomeError:
    # what to do if there's an error

Tip

We pulled together some of the more common exceptions that Python can throw here.

def convert_to_int(value):
    try:
        return int(value)
    except ValueError:
        print("Oops i can't process this so I will fail quietly with a print statement.")
        return None  # or some default value
convert_to_int("123")
123
convert_to_int("abc") 
Oops i can't process this so I will fail quietly with a print statement.

This function attempts to convert a value to an integer, returning None and a message if the conversion fails. However, is that message helpful to a person using your code?

Fail fast strategy#

Identify data processing or workflow problems immediately when they occur and throw an error rather than allowing them to propagate through your code. This approach saves time and simplifies debugging, providing clearer, more useful error outputs (stack traces). Below, you can see that the code tries to open a file, but Python can’t find the file. In response, Python throws a FileNotFoundError.

# Open a file (but it doesn't exist
def read_file(file_path):
    with open(file_path, 'r') as file:
        data = file.read()
    return data

file_data = read_file("nonexistent_file.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[7], line 7
      4         data = file.read()
      5     return data
----> 7 file_data = read_file("nonexistent_file.txt")

Cell In[7], line 3, in read_file(file_path)
      2 def read_file(file_path):
----> 3     with open(file_path, 'r') as file:
      4         data = file.read()
      5     return data

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs)
    317 if file in {0, 1, 2}:
    318     raise ValueError(
    319         f"IPython won't let you open fd={file} by default "
    320         "as it is likely to crash IPython. If you know what you are doing, "
    321         "you can use builtins' open."
    322     )
--> 324 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent_file.txt'

You could anticipate a user providing a bad file path. This might be especially possible if you plan to share your code with others and run it on different computers and different operating systems.

In the example below, you use a conditional statement to check if the file exists; if it doesn’t, it returns None. In this case, the code will fail quietly, and the user will not understand that there is an error.

This is also dangerous territory for a user who may not understand why the code runs but doesn’t work.

import os

def read_file_silent(file_path):
    if os.path.exists(file_path):
        with open(file_path, 'r') as file:
            data = file.read()
        return data
    else:
        return None  # Doesn't fail immediately, just returns None

# No error raised, even though the file doesn't exist
file_data = read_file_silent("nonexistent_file.txt")

Even if you know that it is possible for a FileNotFoundFoundError to be raised here, it’s better to raise the exception rather than catch it and proceed silently so the person calling the function knows there is a problem they need to address.

Say for example reading the data was one step in a longer chain of analyses with other steps that take a long time in between when the data was loaded and when it was used:

# The problem occurs here...
data = read_file_silent("nonexistent_file.txt")

# This file takes an hour to download...
big_file = download_file('http://example.com/big_file.exe')

# And this simulation runs overnight...
generated_data = expensive_simulation()

# We'll only realize there is a problem here, 
# and by then the problem might not be obvious!
analyze_data(data, big_file, generated_data)

By silencing the error, we wasted our whole day!

Catching and Using Exceptions#

If we want to raise exceptions as soon as they happen, why would we ever want to catch them with try/catch? Catching exceptions allows us to choose how we react to them - and vice versa when someone else is running our code, raising exceptions lets them know there is a problem and gives them the opportunity to decide how to proceed.

For example, you might have many datasets to process, and you don’t want to waste time processing one that has missing data, but you also don’t want to stop the whole run because an exception happens somewhere deep within the nest of code. The combination of failing fast with error handling allows us to do that!

Here we use the except {exception type} as {variable} syntax to use the error after catching it, and we store error messages for each dataset to analyze and display them at the end:

from rich.pretty import pprint

data = [2, 4, 6, 8, 'uh oh']

def divide_by_two(value):
    return value/2

def my_analysis(data):
    results = {}
    errors = {}
    for value in data:
        try:
            results[value] = divide_by_two(value)
        except TypeError as e:
            errors[value] = str(e)
    return {'results': results, 'errors': errors}

results = my_analysis(data)
pprint(results, expand_all=False)
{
'results': {2: 1.0, 4: 2.0, 6: 3.0, 8: 4.0},
'errors': {'uh oh': "unsupported operand type(s) for /: 'str' and 'int'"}
}

These techniques stack! So add one more level where we imagine someone else is using our analysis code. They might want to raise an exception to stop processing the rest of their data.

def someone_elses_analysis(data):
    processed = my_analysis(data)
    if processed['errors']:
        raise RuntimeError(f"Caught exception from my_analysis: {processed['errors']}")

someone_elses_analysis(data)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[10], line 6
      3     if processed['errors']:
      4         raise RuntimeError(f"Caught exception from my_analysis: {processed['errors']}")
----> 6 someone_elses_analysis(data)

Cell In[10], line 4, in someone_elses_analysis(data)
      2 processed = my_analysis(data)
      3 if processed['errors']:
----> 4     raise RuntimeError(f"Caught exception from my_analysis: {processed['errors']}")

RuntimeError: Caught exception from my_analysis: {'uh oh': "unsupported operand type(s) for /: 'str' and 'int'"}

Customizing error messages#

Recall the exception from our missing file:

file_data = read_file("nonexistent_file.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[11], line 1
----> 1 file_data = read_file("nonexistent_file.txt")

Cell In[7], line 3, in read_file(file_path)
      2 def read_file(file_path):
----> 3     with open(file_path, 'r') as file:
      4         data = file.read()
      5     return data

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs)
    317 if file in {0, 1, 2}:
    318     raise ValueError(
    319         f"IPython won't let you open fd={file} by default "
    320         "as it is likely to crash IPython. If you know what you are doing, "
    321         "you can use builtins' open."
    322     )
--> 324 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent_file.txt'

Focusing Information - Raising New Exceptions#

The error is useful because it fails and provides a simple and effective message that tells the user to check that their file path is correct. But there’s a lot of information there! The traceback shows us each line of code in between the place where you called the function and where the exception was raised: the read_file() call, the open() call, the bottom-level IPython exception. If you wanted to provide less information to the user, you could catch it and raise a new exception.

If you simply raise a new exception, it is chained to the previous error, which is noisier, not tidier!

def read_file(file_path):
    try:
        with open(file_path, 'r') as file:
            data = file.read()
        return data
    except FileNotFoundError:
        raise FileNotFoundError(
            f"Oops! I couldn't find the file located at: {file_path}. "
            "Please check to see if it exists."
        )  # no "from" statement implicitly chains the prior error


file_data = read_file("nonexistent_file.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[12], line 3, in read_file(file_path)
      2 try:
----> 3     with open(file_path, 'r') as file:
      4         data = file.read()

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs)
    318     raise ValueError(
    319         f"IPython won't let you open fd={file} by default "
    320         "as it is likely to crash IPython. If you know what you are doing, "
    321         "you can use builtins' open."
    322     )
--> 324 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent_file.txt'

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[12], line 13
      6     except FileNotFoundError:
      7         raise FileNotFoundError(
      8             f"Oops! I couldn't find the file located at: {file_path}. "
      9             "Please check to see if it exists."
     10         )  # no "from" statement implicitly chains the prior error
---> 13 file_data = read_file("nonexistent_file.txt")

Cell In[12], line 7, in read_file(file_path)
      5     return data
      6 except FileNotFoundError:
----> 7     raise FileNotFoundError(
      8         f"Oops! I couldn't find the file located at: {file_path}. "
      9         "Please check to see if it exists."
     10     )

FileNotFoundError: Oops! I couldn't find the file located at: nonexistent_file.txt. Please check to see if it exists.

Instead we can use the exception chaining syntax, raise {exception} from {other exception}, to explicitly exclude the original error from the traceback.

def read_file(file_path):
    try:
        with open(file_path, 'r') as file:
            data = file.read()
        return data
    except FileNotFoundError:
        raise FileNotFoundError(
            f"Oops! I couldn't find the file located at: {file_path}. "
            "Please check to see if it exists."
        ) from None  # explicitly break the exception chain


file_data = read_file("nonexistent_file.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[13], line 13
      6     except FileNotFoundError:
      7         raise FileNotFoundError(
      8             f"Oops! I couldn't find the file located at: {file_path}. "
      9             "Please check to see if it exists."
     10         ) from None  # explicitly break the exception chain
---> 13 file_data = read_file("nonexistent_file.txt")

Cell In[13], line 7, in read_file(file_path)
      5     return data
      6 except FileNotFoundError:
----> 7     raise FileNotFoundError(
      8         f"Oops! I couldn't find the file located at: {file_path}. "
      9         "Please check to see if it exists."
     10     ) from None

FileNotFoundError: Oops! I couldn't find the file located at: nonexistent_file.txt. Please check to see if it exists.

This code example below is better than the examples above for three reasons:

  1. It’s pythonic: it asks for forgiveness later by using a try/except

  2. It fails quickly - as soon as it tries to open the file. The code won’t continue to run after this step fails.

  3. It raises a clean, useful error that the user can understand

Adding Information - Using Notes#

The above exception is tidy, and it’s reasonable to do because we know exactly where the code is expected to fail.

The disadvantage to breaking exception chains is that you might not know what is going to cause the exception, and by removing the traceback, you hide potentially valuable information.

To add information without raising a new exception, you can use the Exception.add_note() method and then re-raise the same error:

def read_file(file_path):
    try:
        with open(file_path, 'r') as file:
            data = file.read()
        return data
    except FileNotFoundError as e:
        e.add_note("Here's the deal, we both know that file should have been there, but now its not ok?")
        raise e

# Raises an error immediately if the file doesn't exist
file_data = read_file("nonexistent_file.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[14], line 11
      8         raise e
     10 # Raises an error immediately if the file doesn't exist
---> 11 file_data = read_file("nonexistent_file.txt")

Cell In[14], line 8, in read_file(file_path)
      6 except FileNotFoundError as e:
      7     e.add_note("Here's the deal, we both know that file should have been there, but now its not ok?")
----> 8     raise e

Cell In[14], line 3, in read_file(file_path)
      1 def read_file(file_path):
      2     try:
----> 3         with open(file_path, 'r') as file:
      4             data = file.read()
      5         return data

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs)
    317 if file in {0, 1, 2}:
    318     raise ValueError(
    319         f"IPython won't let you open fd={file} by default "
    320         "as it is likely to crash IPython. If you know what you are doing, "
    321         "you can use builtins' open."
    322     )
--> 324 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent_file.txt'
Here's the deal, we both know that file should have been there, but now its not ok?

Make Checks Pythonic#

Python has a unique philosophy regarding handling potential errors or exceptional cases. This philosophy is often summarized by the acronym EAFP: “Easier to Ask for Forgiveness than Permission.” When combined with the fail fast approach, your code can be flexible and resilient to the messy realities of data processing.

EAFP vs. LBYL#

There are two main approaches to handling potential errors:

  • EAFP (Easier to Ask for Forgiveness than Permission): Assume the operation will succeed and handle any exceptions if they occur.

  • LBYL (Look Before You Leap): Check for conditions before making calls or accessing data.

Pythonic code generally favors the EAFP approach, which allows for failing fast when an error occurs, providing useful feedback without unnecessary checks.

# LBYL approach - manually check that the user provides an int

def convert_to_int(value):
    if isinstance(value, int):
        return int(value)
    else:
        print("Oops i can't process this so I will fail gracefully.")
        return None 

convert_to_int(1)
convert_to_int("a")
Oops i can't process this so I will fail gracefully.
# EAFP approach - Consider what the user might provide and catch the error.

def convert_to_int(value):
    try:
        return int(value)
    except ValueError:
        print("Oops i can't process this so I will fail gracefully.")
        return None  # or some default value

convert_to_int(1)
convert_to_int("a")
Oops i can't process this so I will fail gracefully.

The EAFP (Easier to Ask for Forgiveness than Permission) approach is more Pythonic because:

  • It’s often faster, avoiding redundant checks when operations succeed.

  • It’s more readable, separating the intended operation and error handling.

Any Check is a Good Check#

As long as you consider edge cases, you’re writing great code! You don’t need to worry about being “Pythonic” immediately, but understanding both approaches is useful regardless of which approach you chose.