Tests & Checks for your code: Activity 3#

In this activity, you will build checks into your workflow using try/except blocks added to functions to handle some “features” found in the JOSS, CrossRef citation data.

Note

A data feature, as defined here, represents unexpected values that may be found in real-world data. You will rarely find that your data can be processed without some cleaning steps!

Real world data processing & workflows and edge cases#

Real-world data can rarely be imported without cleanup steps. You will often find unusual data values you don’t expect. Sometimes, these values are documented–for example, a 9999 may represent a missing value in a dataset. And sometimes, that missing value is documented for you. Yay!

Other times, the data contains undocumented typos and other errors that you need to handle in your code. In this activity, you will see these unusual values referred to as data “edge cases.”

Writing robust code that handles unexpected values will make your code run smoothly and fail gracefully. This type of code, which combines functions (or classes) and checks within the functions that handle messy data, will make your code easier to maintain.

Strategies for handling messy data#

There are several strategies that you can employ to handle unusual data values. In this activity, you will apply the following strategies to make your code more robust, maintainable & usable:

  • conditional statements to check for specific conditions before executing code. This allows you to create different pathways for code to execute based on specific conditions.

  • try/except blocks allow you to handle potential errors by attempting an operation and catching any exceptions if they occur, providing useful feedback. Sometimes, you may want the program to end on an error. In other cases, you may want to handle it in a specific way.

  • fail fast with useful error messages: Failing fast is a software engineering term that means allowing your code to stop when something goes wrong, ensuring that errors are caught and communicated promptly. This helps the user quickly understand the error, what went wrong, and where.

Tip

As you make decisions about adding checks to your code, weigh the value of using Pythonic approach vs. literal checks (look before you leap) to address potential errors in your code. This means asking yourself if the code should ask for forgiveness later or check an object’s state or type before proceeding.

# This fails with a FileNotFound Error
from pathlib import Path

file_path = Path("data-bad-path") / "2022-03-joss-publications.json"

try:
    print(file_path)
    with file_path.open("r") as json_file:
        json_data = json.load(json_file)
    json_clean = pd.json_normalize(json_data)
except:
    print("This doesn't fail fast, it only prints a message")

print("Look, i keep running after the try/except block which means I didn't fail")
data-bad-path/2022-03-joss-publications.json
This doesn't fail fast, it only prints a message
Look, i keep running after the try/except block which means I didn't fail

Failing fast#

If you are processing specific data in your workflow, then ensuring your code can successfully find the data is your first (and possibly most important) goal.

Consider: How does your code handle and tell a user that it can’t find the data that you want it to open?

If your code doesn’t fail fast with a useful error message, and it continues to run and fails later, it will potentially confuse a user. The error that will likely be raised later will likely not alert the user that the issue is actually missing data vs something else.

This will then mislead someone when trying to troubleshoot your code.

Functions, classes, and methods are a tool#

Using functions and class methods is a great first step in handling messy data. A function or method provides a modular unit you can test outside the workflow for the edge cases you may encounter. Also, because a function is a modular unit, you can add elements to handle unexpected processing features as you build your workflow.

Once you have these functions and methods, you can add checks using conditional statements and try/except blocks that anticipate edge cases and errors you may encounter when processing your data.

Activity 3, part 1#

Turn the code below into a function. Add a try/except statement that raises a FileNotFoundError with a custom message that will stop the code from continuing to run.

from pathlib import Path 

file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
print(file_path)
with file_path.open("r") as json_file:
    json_data = json.load(json_file)
json_clean = pd.json_normalize(json_data)

json_clean.head(2)
data-bad-path/2022-03-joss-publications.json
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 5
      3 file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
      4 print(file_path)
----> 5 with file_path.open("r") as json_file:
      6     json_data = json.load(json_file)
      7 json_clean = pd.json_normalize(json_data)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/pathlib.py:1044, in Path.open(self, mode, buffering, encoding, errors, newline)
   1042 if "b" not in mode:
   1043     encoding = io.text_encoding(encoding)
-> 1044 return io.open(self, mode, buffering, encoding, errors, newline)

FileNotFoundError: [Errno 2] No such file or directory: 'data-bad-path/2022-03-joss-publications.json'

Activity 3, part 2#

Turn the code below into a function. Modify the function below so it raises catches the errror and prints a custom error but does not stop your code from continuing to run.

Consider the difference between the funciton the funciton that you created above that stops your code from running by raising an exception compared to this code that prints a statement for the user.

from pathlib import Path 

file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
print(file_path)
with file_path.open("r") as json_file:
    json_data = json.load(json_file)
json_clean = pd.json_normalize(json_data)

json_clean.head(2)
data-bad-path/2022-03-joss-publications.json
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[3], line 5
      3 file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
      4 print(file_path)
----> 5 with file_path.open("r") as json_file:
      6     json_data = json.load(json_file)
      7 json_clean = pd.json_normalize(json_data)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/pathlib.py:1044, in Path.open(self, mode, buffering, encoding, errors, newline)
   1042 if "b" not in mode:
   1043     encoding = io.text_encoding(encoding)
-> 1044 return io.open(self, mode, buffering, encoding, errors, newline)

FileNotFoundError: [Errno 2] No such file or directory: 'data-bad-path/2022-03-joss-publications.json'

Activity 3, part 2: Find the data & fail fast when it’s missing#

Activity 3, part 1 code example#

Consider the code below. Note that the code below has an incorrect /data directory path that doesn’t exist. Notice that the error that is thrown after running the code is not a FileNotFounderror.

Instead, it raises a ValueError: ValueError: No objects to concatenate), which is much less useful to a user (who could be your future self).

Group work

In small groups, consider the code and answer the following questions together.

Questions:

  • Does the code fail fast?

  • What type of error do you want Python to throw when it can’t find a data file? Use Google, LLMs, or our tests and checks lesson to help figure this out.

  • Does the code handle the actual error gracefully?

  • How can you make the code better handle missing data files?

def load_clean_json(file_path, columns_to_keep):
    """
    Load JSON data from a file. Drop unnecessary columns and normalize
    to DataFrame.

    Parameters
    ----------
    file_path : Path
        Path to the JSON file.
    columns_to_keep : list
        List of columns to keep in the DataFrame.

    Returns
    -------
    dict
        Loaded JSON data.
    """

    with file_path.open("r") as json_file:
        json_data = json.load(json_file)
    return pd.json_normalize(json_data)


load_clean_json("path-here")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 24
     20         json_data = json.load(json_file)
     21     return pd.json_normalize(json_data)
---> 24 load_clean_json("path-here")

TypeError: load_clean_json() missing 1 required positional argument: 'columns_to_keep'
import json
import os
from pathlib import Path

import pandas as pd


def load_clean_json(file_path, columns_to_keep):
    """
    Load JSON data from a file. Drop unnecessary columns and normalize
    to DataFrame.

    Parameters
    ----------
    file_path : Path
        Path to the JSON file.
    columns_to_keep : list
        List of columns to keep in the DataFrame.

    Returns
    -------
    dict
        Loaded JSON data.
    """

    with file_path.open("r") as json_file:
        json_data = json.load(json_file)
    return pd.json_normalize(json_data)


columns_to_keep = [
    "publisher",
    "DOI",
    "type",
    "author",
    "is-referenced-by-count",
    "title",
    "published.date-parts",
]
# Notice that this is bad data dir
# What happens when your code runs?
data_dir = Path("data-bad")

files = [
    "2022-01-joss-publications.json",
    "2022-02-joss-publications.json",
    "2022-03-joss-publications.json",
]

# Create a list of Path objects
all_files = [data_dir / file for file in files]

all_papers_list = []
# An empty iterator will  never run
for json_file in all_files:
    papers_df = load_clean_json(json_file, columns_to_keep)
    all_papers_list.append(papers_df)

all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True)
all_papers_df
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[5], line 56
     54 # An empty iterator will  never run
     55 for json_file in all_files:
---> 56     papers_df = load_clean_json(json_file, columns_to_keep)
     57     all_papers_list.append(papers_df)
     59 all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True)

Cell In[5], line 26, in load_clean_json(file_path, columns_to_keep)
      8 def load_clean_json(file_path, columns_to_keep):
      9     """
     10     Load JSON data from a file. Drop unnecessary columns and normalize
     11     to DataFrame.
   (...)
     23         Loaded JSON data.
     24     """
---> 26     with file_path.open("r") as json_file:
     27         json_data = json.load(json_file)
     28     return pd.json_normalize(json_data)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/pathlib.py:1044, in Path.open(self, mode, buffering, encoding, errors, newline)
   1042 if "b" not in mode:
   1043     encoding = io.text_encoding(encoding)
-> 1044 return io.open(self, mode, buffering, encoding, errors, newline)

FileNotFoundError: [Errno 2] No such file or directory: 'data-bad/2022-01-joss-publications.json'

Activity 3: part 1 try/excepts & files#

Modify the file load function below with a try/except block that it returns a custom error message when it can’t find a file but returns the normalized data when it can

Activity 3, part 3: Add checks to the format_date function#

The code below creates a pandas.DataFrame with the first 15 publications in the JOSS sample data.json file. This is the first of 3 files you must process in your workflow.

Your first task is to process and format the published_date column in the data to make it a pandas.Timestamp object. Having a date in a datetime format like pandas.Timestamp or datetime.datetime will allow you to do time-related analysis on your data, such as counting publications by month and year! The expected CrossRef published date should be:

"published": {
      "date-parts": [
        [
          2022,
          11,
          27
        ]
      ]
    }

However, the date is not always formatted as expected in the above sample data.

For this activity, focus on adding checks to the format_date function. IMPORTANT: Use the sample data provided below for your troubleshooting exercise. This will allow you to focus on fixing only one function rather than trying to troubleshoot the entire workflow!

Activity 2: Part 2

In small groups, do the following:

  1. Evaluate the published_date field in the data created below and answer the question:

  • Do you see any unusually-formatted values that may be responsible for making your code above fail?

  1. Once you have a list of issues you observe in the data, address them by modifying the format_date function below.

Format dates with pandas.to_datetime()#

Let’s work on formatting dates so there is a consistent format in our dataframe. Python has a string formatting language that defines useful characters for formatting.

What Does 02d Mean?

  • d: This part of the format code means you’re expecting an integer. It tells Python to format the value as a decimal (whole number).

  • 02: The 02 means the number should be padded with leading zeros if necessary, so the total width is 2 digits. For example:

  • 1 becomes 01

  • 5 becomes 05

  • 12 stays as 12 (no padding needed)

This is especially useful for formatting months or days, which often require a MM-DD format (e.g., 01-05 for January 5th).

import pandas as pd

# Manually recreate data for the first 15 crossref entries
joss_pubs = [
    {
        "title": ["bmiptools: BioMaterials Image Processing Tools"],
        "published_date": [["2022", "11", "27"]],
        "citations": 2,
    },
    {
        "title": [
            [
                "QuasinormalModes.jl: A Julia package for computing discrete eigenvalues of second order ODEs"
            ]
        ],
        "published_date": [2022, "5", 25],
        "citations": 2,
    },
    {
        "title": [
            "CWInPy: A Python package for inference with continuous gravitational-wave signals from pulsars"
        ],
        "published_date": [[2022, 9, "29"]],
        "citations": 3,
    },
    {
        "title": [
            "Nempy: A Python package for modelling the Australian National Electricity Market dispatch procedure"
        ],
        "published_date": [[""]],
        "citations": 2,
    },
    {
        "title": [
            "Spectral Connectivity: a python package for computing spectral coherence and related measures"
        ],
        "published_date": [[]],  # No date available
        "citations": 3,
    },
    {
        "title": [
            "SEEDPOD Ground Risk: A Python application and framework for assessing the risk to people on the ground from uncrewed aerial vehicles (UAVs)"
        ],
        "published_date": [["2022", "3", ""]],
        "citations": 1,
    },
    {
        "title": [
            "DIANNA: Deep Insight And Neural Network Analysis, explainability in time series"
        ],
        "published_date": [[2022, 12, 15]],
        "citations": 1,
    },
    {
        "title": [
            ["diman: A Clojure Package for Dimensional Analysis and Unit Checking"]
        ],
        "published_date": [[2022, 1]],
        "citations": 0,
    },
    {
        "title": [
            "PERFORM: A Python package for developing reduced-order models for flow simulation"
        ],
        "published_date": [[9999]],
        "citations": 3,
    },
    {
        "title": ["TLViz: Visualising and analysing tensor decompositions"],
        "published_date": [[2022, 11, 25]],
        "citations": 2,
    },
    {
        "title": ["ALUES: R package for Agricultural Land Use Evaluation System"],
        "published_date": [[2022, 5, 12]],
        "citations": 1,
    },
    {
        "title": [
            [
                "Spiner: Performance Portable Routines for Generalized SpMV and Triangular Solvers"
            ]
        ],
        "published_date": [[2022, 7, 5]],
        "citations": 0,
    },
    {
        "title": ["pyndl: Naïve Discriminative Learning in Python"],
        "published_date": [[2022, 12, 15]],
        "citations": 0,
    },
    {
        "title": ["HostPhot: global and local photometry of galaxies"],
        "published_date": [[2022, 8, 15]],
        "citations": 1,
    },
    {
        "title": [
            "QMKPy: A Python Testbed for the Quadratic Multichannel Kalman Filter"
        ],
        "published_date": [[2022, 11, 2]],
        "citations": 0,
    },
]

joss_pubs_df = pd.DataFrame(joss_pubs)
joss_pubs_df.head(15)
title published_date citations
0 [bmiptools: BioMaterials Image Processing Tools] [[2022, 11, 27]] 2
1 [[QuasinormalModes.jl: A Julia package for com... [2022, 5, 25] 2
2 [CWInPy: A Python package for inference with c... [[2022, 9, 29]] 3
3 [Nempy: A Python package for modelling the Aus... [[]] 2
4 [Spectral Connectivity: a python package for c... [[]] 3
5 [SEEDPOD Ground Risk: A Python application and... [[2022, 3, ]] 1
6 [DIANNA: Deep Insight And Neural Network Analy... [[2022, 12, 15]] 1
7 [[diman: A Clojure Package for Dimensional Ana... [[2022, 1]] 0
8 [PERFORM: A Python package for developing redu... [[9999]] 3
9 [TLViz: Visualising and analysing tensor decom... [[2022, 11, 25]] 2
10 [ALUES: R package for Agricultural Land Use Ev... [[2022, 5, 12]] 1
11 [[Spiner: Performance Portable Routines for Ge... [[2022, 7, 5]] 0
12 [pyndl: Naïve Discriminative Learning in Python] [[2022, 12, 15]] 0
13 [HostPhot: global and local photometry of gala... [[2022, 8, 15]] 1
14 [QMKPy: A Python Testbed for the Quadratic Mul... [[2022, 11, 2]] 0
def format_date(date_parts: list) -> str:
    """
    Format date parts into a string.

    Parameters
    ----------
    date_parts : list
        List containing year, month, and day.

    Returns
    -------
    pd.datetime
        A date formatted as a pd.datetime object.
    """
    # A print statement might help you identify the issue
    print(f"The input value is: {date_parts}")
    date_str = f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}"
    return pd.to_datetime(date_str, format="%Y-%m-%d")
joss_pubs_df["published_date"][0]
[['2022', '11', '27']]
# Format date fails on row 3
format_date(joss_pubs_df["published_date"][2])
The input value is: [[2022, 9, '29']]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 2
      1 # Format date fails on row 3
----> 2 format_date(joss_pubs_df["published_date"][2])

Cell In[7], line 17, in format_date(date_parts)
     15 # A print statement might help you identify the issue
     16 print(f"The input value is: {date_parts}")
---> 17 date_str = f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}"
     18 return pd.to_datetime(date_str, format="%Y-%m-%d")

ValueError: Unknown format code 'd' for object of type 'str'
# Format date runs fine on row 14
format_date(joss_pubs_df["published_date"][13])
The input value is: [[2022, 8, 15]]
Timestamp('2022-08-15 00:00:00')

How to apply functions to DataFrame values: .apply()#

The .apply() method allows you to apply any function to rows or columns in a pandas.DataFrame. For example, you can use it to perform operations on specific column or row values. When you use .apply(), you can specify whether you want to apply the function across columns (axis=0) (the default) or across rows (axis=1).

For example, if you want to apply a function to each row of a DataFrame, you would use df.apply(your_function, axis=1). This function is especially useful for applying logic that can’t be easily achieved with built-in pandas functions, allowing for more flexibility in data processing.

You can use .apply in pandas to efficiently replace for loops to process row and column values in a pandas.DataFrame.

# Apply the format_date function to every row in the published_date column
joss_pubs_df['published_date'].apply(format_date)

Tip

Important: It is ok if you can’t get the code to run fully by the end of this workshop. If you can:

  1. identify at least one of the data processing “bugs” (even if you can’t fix it) and/or

  2. fix at least one bug

You can consider your effort today as a success!

Activity 3, part 3#

Activity 3.3

Your goal in this activity is to generate a list of all package names found in the example CrossRef data. Below is a clean_title function and a small workflow that parses through all titles in the sample data.

However, the function isn’t working as expected. Add checks to the clean_title function to ensure it correctly extracts the title of each package in each publication.

joss_pubs_df["title"].head(15)
0      [bmiptools: BioMaterials Image Processing Tools]
1     [[QuasinormalModes.jl: A Julia package for com...
2     [CWInPy: A Python package for inference with c...
3     [Nempy: A Python package for modelling the Aus...
4     [Spectral Connectivity: a python package for c...
5     [SEEDPOD Ground Risk: A Python application and...
6     [DIANNA: Deep Insight And Neural Network Analy...
7     [[diman: A Clojure Package for Dimensional Ana...
8     [PERFORM: A Python package for developing redu...
9     [TLViz: Visualising and analysing tensor decom...
10    [ALUES: R package for Agricultural Land Use Ev...
11    [[Spiner: Performance Portable Routines for Ge...
12     [pyndl: Naïve Discriminative Learning in Python]
13    [HostPhot: global and local photometry of gala...
14    [QMKPy: A Python Testbed for the Quadratic Mul...
Name: title, dtype: object
def clean_title(title):
    """Get package name from a crossref title string.

    Parameters
    ----------
    title : str
        The title string containing a package name followed by a colon and description.

    Returns
    -------
    str
        The package name before the colon.

    """

    return title[0].split(":")
# Add checks to the clean_title function to make sure this code runs
all_titles = []
for a_title in joss_pubs_df["title"]:
    all_titles.append(clean_title(a_title))
all_titles
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[13], line 4
      2 all_titles = []
      3 for a_title in joss_pubs_df["title"]:
----> 4     all_titles.append(clean_title(a_title))
      5 all_titles

Cell In[12], line 16, in clean_title(title)
      1 def clean_title(title):
      2     """Get package name from a crossref title string.
      3 
      4     Parameters
   (...)
     13 
     14     """
---> 16     return title[0].split(":")

AttributeError: 'list' object has no attribute 'split'
a = joss_pubs_df["title"][0]
a[0].split(":")
# joss_pubs_df["title"][0]
['bmiptools', ' BioMaterials Image Processing Tools']
# The title value in the first row of the df
print(joss_pubs_df["title"][0])
print(type(joss_pubs_df["title"][0]))
['bmiptools: BioMaterials Image Processing Tools']
<class 'list'>
# The title value unnested from the list
print(joss_pubs_df["title"][0][0])
print(type(joss_pubs_df["title"][0][0]))
bmiptools: BioMaterials Image Processing Tools
<class 'str'>
print(f"The value is {joss_pubs_df['title'][0]}")
clean_title(joss_pubs_df["title"][0])
The value is ['bmiptools: BioMaterials Image Processing Tools']
['bmiptools', ' BioMaterials Image Processing Tools']
clean_title(joss_pubs_df["title"][1])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[18], line 1
----> 1 clean_title(joss_pubs_df["title"][1])

Cell In[12], line 16, in clean_title(title)
      1 def clean_title(title):
      2     """Get package name from a crossref title string.
      3 
      4     Parameters
   (...)
     13 
     14     """
---> 16     return title[0].split(":")

AttributeError: 'list' object has no attribute 'split'

On your own#

On Your Own 1

If you complete all the activities above, consider this challenge. Fix the workflow below so it runs. To do this, you can use the results of the functions you worked on above.

# Full code snippet
import json
from pathlib import Path

import pandas as pd


def load_clean_json(file_path, columns_to_keep):
    """
    Load JSON data from a file. Drop unnecessary columns and normalize
    to DataFrame.

    Parameters
    ----------
    file_path : Path
        Path to the JSON file.
    columns_to_keep : list
        List of columns to keep in the DataFrame.

    Returns
    -------
    dict
        Loaded JSON data.
    """

    with file_path.open("r") as json_file:
        json_data = json.load(json_file)
    normalized_data = pd.json_normalize(json_data)

    return normalized_data.filter(items=columns_to_keep)


def format_date(date_parts: list) -> str:
    """
    Format date parts into a string.

    Parameters
    ----------
    date_parts : list
        List containing year, month, and day.

    Returns
    -------
    pd.datetime
        A date formatted as a `pd.datetime` object.
    """
    date_str = f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}"
    return pd.to_datetime(date_str, format="%Y-%m-%d")


def clean_title(value):
    """Removes a value contained in a list.

    Parameters
    ----------
    value : list
        A list containing one or more elements.

    Returns
    -------
    Any
        The first element of the list `value`.
    """
    print("hi", value)
    return value[0]


columns_to_keep = [
    "publisher",
    "DOI",
    "type",
    "author",
    "is-referenced-by-count",
    "title",
    "published.date-parts",
]

data_dir = Path("data")

all_papers_list = []
for json_file in sorted(data_dir.glob("*.json")):
    print(json_file)
    papers_df = load_clean_json(json_file, columns_to_keep)
    papers_df["published_date"] = papers_df["published.date-parts"].apply(format_date)
    papers_df["title"] = papers_df["title"].apply(clean_title)

    all_papers_list.append(papers_df)

all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True)

print("Final shape of combined DataFrame:", all_papers_df.shape)
data/2022-01-joss-publications.json
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[19], line 84
     82 print(json_file)
     83 papers_df = load_clean_json(json_file, columns_to_keep)
---> 84 papers_df["published_date"] = papers_df["published.date-parts"].apply(format_date)
     85 papers_df["title"] = papers_df["title"].apply(clean_title)
     87 all_papers_list.append(papers_df)

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
   4789 def apply(
   4790     self,
   4791     func: AggFuncType,
   (...)
   4796     **kwargs,
   4797 ) -> DataFrame | Series:
   4798     """
   4799     Invoke function on values of Series.
   4800 
   (...)
   4915     dtype: float64
   4916     """
   4917     return SeriesApply(
   4918         self,
   4919         func,
   4920         convert_dtype=convert_dtype,
   4921         by_row=by_row,
   4922         args=args,
   4923         kwargs=kwargs,
-> 4924     ).apply()

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/apply.py:1427, in SeriesApply.apply(self)
   1424     return self.apply_compat()
   1426 # self.func is Callable
-> 1427 return self.apply_standard()

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/apply.py:1507, in SeriesApply.apply_standard(self)
   1501 # row-wise access
   1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
   1503 # we need to give `na_action="ignore"` for categorical data.
   1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
   1505 #  Categorical (GH51645).
   1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
   1508     mapper=curried, na_action=action, convert=self.convert_dtype
   1509 )
   1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1512     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1513     #  See also GH#25959 regarding EA support
   1514     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
    918 if isinstance(arr, ExtensionArray):
    919     return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
   1741 values = arr.astype(object, copy=False)
   1742 if na_action is None:
-> 1743     return lib.map_infer(values, mapper, convert=convert)
   1744 else:
   1745     return lib.map_infer_mask(
   1746         values, mapper, mask=isna(values).view(np.uint8), convert=convert
   1747     )

File lib.pyx:2972, in pandas._libs.lib.map_infer()

Cell In[19], line 47, in format_date(date_parts)
     33 def format_date(date_parts: list) -> str:
     34     """
     35     Format date parts into a string.
     36 
   (...)
     45         A date formatted as a `pd.datetime` object.
     46     """
---> 47     date_str = f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}"
     48     return pd.to_datetime(date_str, format="%Y-%m-%d")

ValueError: Unknown format code 'd' for object of type 'str'