Tests & Checks for your code: Activity 3#
In activity 1, you made your code cleaner and more usable using expressive variable names and docstrings to document the module.
In activity 2, you made your code more DRY (“Don’t Repeat Yourself”) using functions and conditionals.
In this activity, you will build checks into your workflow using try/except blocks added to functions to handle some “features” found in the JOSS, CrossRef citation data.
Note
A data feature, as defined here, represents unexpected values that may be found in real-world data. You will rarely find that your data can be processed without some cleaning steps!
Real world data processing & workflows and edge cases#
Real-world data can rarely be imported without cleanup steps. You will often find unusual data values you don’t expect. Sometimes, these values are documented–for example, a 9999
may represent a missing value in a dataset. And sometimes, that missing value is documented for you. Yay!
Other times, the data contains undocumented typos and other errors that you need to handle in your code. In this activity, you will see these unusual values referred to as data “edge cases.”
Writing robust code that handles unexpected values will make your code run smoothly and fail gracefully. This type of code, which combines functions (or classes) and checks within the functions that handle messy data, will make your code easier to maintain.
Strategies for handling messy data#
There are several strategies that you can employ to handle unusual data values. In this activity, you will apply the following strategies to make your code more robust, maintainable & usable:
conditional statements to check for specific conditions before executing code. This allows you to create different pathways for code to execute based on specific conditions.
try/except blocks allow you to handle potential errors by attempting an operation and catching any exceptions if they occur, providing useful feedback. Sometimes, you may want the program to end on an error. In other cases, you may want to handle it in a specific way.
fail fast with useful error messages: Failing fast is a software engineering term that means allowing your code to stop when something goes wrong, ensuring that errors are caught and communicated promptly. This helps the user quickly understand the error, what went wrong, and where.
Tip
As you make decisions about adding checks to your code, weigh the value of using Pythonic approach vs. literal checks (look before you leap) to address potential errors in your code. This means asking yourself if the code should ask for forgiveness later or check an object’s state or type before proceeding.
# This fails with a FileNotFound Error
from pathlib import Path
file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
try:
print(file_path)
with file_path.open("r") as json_file:
json_data = json.load(json_file)
json_clean = pd.json_normalize(json_data)
except:
print("This doesn't fail fast, it only prints a message")
print("Look, i keep running after the try/except block which means I didn't fail")
data-bad-path/2022-03-joss-publications.json
This doesn't fail fast, it only prints a message
Look, i keep running after the try/except block which means I didn't fail
Failing fast#
If you are processing specific data in your workflow, then ensuring your code can successfully find the data is your first (and possibly most important) goal.
Consider: How does your code handle and tell a user that it can’t find the data that you want it to open?
If your code doesn’t fail fast with a useful error message, and it continues to run and fails later, it will potentially confuse a user. The error that will likely be raised later will likely not alert the user that the issue is actually missing data vs something else.
This will then mislead someone when trying to troubleshoot your code.
Functions, classes, and methods are a tool#
Using functions and class methods is a great first step in handling messy data. A function or method provides a modular unit you can test outside the workflow for the edge cases you may encounter. Also, because a function is a modular unit, you can add elements to handle unexpected processing features as you build your workflow.
Once you have these functions and methods, you can add checks using conditional statements and try/except blocks that anticipate edge cases and errors you may encounter when processing your data.
Activity 3, part 1#
Turn the code below into a function.
Add a try/except statement that raises a FileNotFoundError
with a custom message that will stop the code from
continuing to run.
from pathlib import Path
file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
print(file_path)
with file_path.open("r") as json_file:
json_data = json.load(json_file)
json_clean = pd.json_normalize(json_data)
json_clean.head(2)
data-bad-path/2022-03-joss-publications.json
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[2], line 5
3 file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
4 print(file_path)
----> 5 with file_path.open("r") as json_file:
6 json_data = json.load(json_file)
7 json_clean = pd.json_normalize(json_data)
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/pathlib.py:1044, in Path.open(self, mode, buffering, encoding, errors, newline)
1042 if "b" not in mode:
1043 encoding = io.text_encoding(encoding)
-> 1044 return io.open(self, mode, buffering, encoding, errors, newline)
FileNotFoundError: [Errno 2] No such file or directory: 'data-bad-path/2022-03-joss-publications.json'
Activity 3, part 2#
Turn the code below into a function. Modify the function below so it raises catches the errror and prints a custom error but does not stop your code from continuing to run.
Consider the difference between the funciton the funciton that you created above that stops your code from running by raising an exception compared to this code that prints a statement for the user.
from pathlib import Path
file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
print(file_path)
with file_path.open("r") as json_file:
json_data = json.load(json_file)
json_clean = pd.json_normalize(json_data)
json_clean.head(2)
data-bad-path/2022-03-joss-publications.json
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[3], line 5
3 file_path = Path("data-bad-path") / "2022-03-joss-publications.json"
4 print(file_path)
----> 5 with file_path.open("r") as json_file:
6 json_data = json.load(json_file)
7 json_clean = pd.json_normalize(json_data)
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/pathlib.py:1044, in Path.open(self, mode, buffering, encoding, errors, newline)
1042 if "b" not in mode:
1043 encoding = io.text_encoding(encoding)
-> 1044 return io.open(self, mode, buffering, encoding, errors, newline)
FileNotFoundError: [Errno 2] No such file or directory: 'data-bad-path/2022-03-joss-publications.json'
Activity 3, part 2: Find the data & fail fast when it’s missing#
Activity 3, part 1 code example#
Consider the code below. Note that the code below has an incorrect /data
directory path that doesn’t exist. Notice that the error that is thrown after running the code is not a FileNotFounderror
.
Instead, it raises a ValueError
: ValueError: No objects to concatenate
), which is much less useful to a user (who could be your future self).
Group work
In small groups, consider the code and answer the following questions together.
Questions:
Does the code fail fast?
What type of error do you want Python to throw when it can’t find a data file? Use Google, LLMs, or our tests and checks lesson to help figure this out.
Does the code handle the actual error gracefully?
How can you make the code better handle missing data files?
def load_clean_json(file_path, columns_to_keep):
"""
Load JSON data from a file. Drop unnecessary columns and normalize
to DataFrame.
Parameters
----------
file_path : Path
Path to the JSON file.
columns_to_keep : list
List of columns to keep in the DataFrame.
Returns
-------
dict
Loaded JSON data.
"""
with file_path.open("r") as json_file:
json_data = json.load(json_file)
return pd.json_normalize(json_data)
load_clean_json("path-here")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[4], line 24
20 json_data = json.load(json_file)
21 return pd.json_normalize(json_data)
---> 24 load_clean_json("path-here")
TypeError: load_clean_json() missing 1 required positional argument: 'columns_to_keep'
import json
import os
from pathlib import Path
import pandas as pd
def load_clean_json(file_path, columns_to_keep):
"""
Load JSON data from a file. Drop unnecessary columns and normalize
to DataFrame.
Parameters
----------
file_path : Path
Path to the JSON file.
columns_to_keep : list
List of columns to keep in the DataFrame.
Returns
-------
dict
Loaded JSON data.
"""
with file_path.open("r") as json_file:
json_data = json.load(json_file)
return pd.json_normalize(json_data)
columns_to_keep = [
"publisher",
"DOI",
"type",
"author",
"is-referenced-by-count",
"title",
"published.date-parts",
]
# Notice that this is bad data dir
# What happens when your code runs?
data_dir = Path("data-bad")
files = [
"2022-01-joss-publications.json",
"2022-02-joss-publications.json",
"2022-03-joss-publications.json",
]
# Create a list of Path objects
all_files = [data_dir / file for file in files]
all_papers_list = []
# An empty iterator will never run
for json_file in all_files:
papers_df = load_clean_json(json_file, columns_to_keep)
all_papers_list.append(papers_df)
all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True)
all_papers_df
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[5], line 56
54 # An empty iterator will never run
55 for json_file in all_files:
---> 56 papers_df = load_clean_json(json_file, columns_to_keep)
57 all_papers_list.append(papers_df)
59 all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True)
Cell In[5], line 26, in load_clean_json(file_path, columns_to_keep)
8 def load_clean_json(file_path, columns_to_keep):
9 """
10 Load JSON data from a file. Drop unnecessary columns and normalize
11 to DataFrame.
(...)
23 Loaded JSON data.
24 """
---> 26 with file_path.open("r") as json_file:
27 json_data = json.load(json_file)
28 return pd.json_normalize(json_data)
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/pathlib.py:1044, in Path.open(self, mode, buffering, encoding, errors, newline)
1042 if "b" not in mode:
1043 encoding = io.text_encoding(encoding)
-> 1044 return io.open(self, mode, buffering, encoding, errors, newline)
FileNotFoundError: [Errno 2] No such file or directory: 'data-bad/2022-01-joss-publications.json'
Activity 3: part 1 try/excepts & files#
Modify the file load function below with a try/except block that it returns a custom error message when it can’t find a file but returns the normalized data when it can
Activity 3, part 3: Add checks to the format_date
function#
The code below creates a pandas.DataFrame
with the first 15 publications in the JOSS sample data.json
file. This is the first of 3 files you must process in your workflow.
Your first task is to process and format the published_date
column in the data to make it a pandas.Timestamp
object. Having a date in a datetime
format like pandas.Timestamp
or datetime.datetime
will allow you to do time-related analysis on your data, such as counting publications by month and year! The expected CrossRef published date should be:
"published": {
"date-parts": [
[
2022,
11,
27
]
]
}
However, the date is not always formatted as expected in the above sample data.
For this activity, focus on adding checks to the format_date
function. IMPORTANT: Use the sample data provided below for your troubleshooting exercise. This will allow you to focus on fixing only one function rather than trying to troubleshoot the entire workflow!
Activity 2: Part 2
In small groups, do the following:
Evaluate the
published_date
field in the data created below and answer the question:
Do you see any unusually-formatted values that may be responsible for making your code above fail?
Once you have a list of issues you observe in the data, address them by modifying the
format_date
function below.
Format dates with pandas.to_datetime()
#
Let’s work on formatting dates so there is a consistent format in our dataframe. Python has a string formatting language that defines useful characters for formatting.
What Does 02d
Mean?
d
: This part of the format code means you’re expecting an integer. It tells Python to format the value as a decimal (whole number).02
: The02
means the number should be padded with leading zeros if necessary, so the total width is 2 digits. For example:1
becomes01
5
becomes05
12
stays as12
(no padding needed)
This is especially useful for formatting months or days, which often require a MM-DD
format (e.g., 01-05 for January 5th).
import pandas as pd
# Manually recreate data for the first 15 crossref entries
joss_pubs = [
{
"title": ["bmiptools: BioMaterials Image Processing Tools"],
"published_date": [["2022", "11", "27"]],
"citations": 2,
},
{
"title": [
[
"QuasinormalModes.jl: A Julia package for computing discrete eigenvalues of second order ODEs"
]
],
"published_date": [2022, "5", 25],
"citations": 2,
},
{
"title": [
"CWInPy: A Python package for inference with continuous gravitational-wave signals from pulsars"
],
"published_date": [[2022, 9, "29"]],
"citations": 3,
},
{
"title": [
"Nempy: A Python package for modelling the Australian National Electricity Market dispatch procedure"
],
"published_date": [[""]],
"citations": 2,
},
{
"title": [
"Spectral Connectivity: a python package for computing spectral coherence and related measures"
],
"published_date": [[]], # No date available
"citations": 3,
},
{
"title": [
"SEEDPOD Ground Risk: A Python application and framework for assessing the risk to people on the ground from uncrewed aerial vehicles (UAVs)"
],
"published_date": [["2022", "3", ""]],
"citations": 1,
},
{
"title": [
"DIANNA: Deep Insight And Neural Network Analysis, explainability in time series"
],
"published_date": [[2022, 12, 15]],
"citations": 1,
},
{
"title": [
["diman: A Clojure Package for Dimensional Analysis and Unit Checking"]
],
"published_date": [[2022, 1]],
"citations": 0,
},
{
"title": [
"PERFORM: A Python package for developing reduced-order models for flow simulation"
],
"published_date": [[9999]],
"citations": 3,
},
{
"title": ["TLViz: Visualising and analysing tensor decompositions"],
"published_date": [[2022, 11, 25]],
"citations": 2,
},
{
"title": ["ALUES: R package for Agricultural Land Use Evaluation System"],
"published_date": [[2022, 5, 12]],
"citations": 1,
},
{
"title": [
[
"Spiner: Performance Portable Routines for Generalized SpMV and Triangular Solvers"
]
],
"published_date": [[2022, 7, 5]],
"citations": 0,
},
{
"title": ["pyndl: Naïve Discriminative Learning in Python"],
"published_date": [[2022, 12, 15]],
"citations": 0,
},
{
"title": ["HostPhot: global and local photometry of galaxies"],
"published_date": [[2022, 8, 15]],
"citations": 1,
},
{
"title": [
"QMKPy: A Python Testbed for the Quadratic Multichannel Kalman Filter"
],
"published_date": [[2022, 11, 2]],
"citations": 0,
},
]
joss_pubs_df = pd.DataFrame(joss_pubs)
joss_pubs_df.head(15)
title | published_date | citations | |
---|---|---|---|
0 | [bmiptools: BioMaterials Image Processing Tools] | [[2022, 11, 27]] | 2 |
1 | [[QuasinormalModes.jl: A Julia package for com... | [2022, 5, 25] | 2 |
2 | [CWInPy: A Python package for inference with c... | [[2022, 9, 29]] | 3 |
3 | [Nempy: A Python package for modelling the Aus... | [[]] | 2 |
4 | [Spectral Connectivity: a python package for c... | [[]] | 3 |
5 | [SEEDPOD Ground Risk: A Python application and... | [[2022, 3, ]] | 1 |
6 | [DIANNA: Deep Insight And Neural Network Analy... | [[2022, 12, 15]] | 1 |
7 | [[diman: A Clojure Package for Dimensional Ana... | [[2022, 1]] | 0 |
8 | [PERFORM: A Python package for developing redu... | [[9999]] | 3 |
9 | [TLViz: Visualising and analysing tensor decom... | [[2022, 11, 25]] | 2 |
10 | [ALUES: R package for Agricultural Land Use Ev... | [[2022, 5, 12]] | 1 |
11 | [[Spiner: Performance Portable Routines for Ge... | [[2022, 7, 5]] | 0 |
12 | [pyndl: Naïve Discriminative Learning in Python] | [[2022, 12, 15]] | 0 |
13 | [HostPhot: global and local photometry of gala... | [[2022, 8, 15]] | 1 |
14 | [QMKPy: A Python Testbed for the Quadratic Mul... | [[2022, 11, 2]] | 0 |
def format_date(date_parts: list) -> str:
"""
Format date parts into a string.
Parameters
----------
date_parts : list
List containing year, month, and day.
Returns
-------
pd.datetime
A date formatted as a pd.datetime object.
"""
# A print statement might help you identify the issue
print(f"The input value is: {date_parts}")
date_str = f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}"
return pd.to_datetime(date_str, format="%Y-%m-%d")
joss_pubs_df["published_date"][0]
[['2022', '11', '27']]
# Format date fails on row 3
format_date(joss_pubs_df["published_date"][2])
The input value is: [[2022, 9, '29']]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[9], line 2
1 # Format date fails on row 3
----> 2 format_date(joss_pubs_df["published_date"][2])
Cell In[7], line 17, in format_date(date_parts)
15 # A print statement might help you identify the issue
16 print(f"The input value is: {date_parts}")
---> 17 date_str = f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}"
18 return pd.to_datetime(date_str, format="%Y-%m-%d")
ValueError: Unknown format code 'd' for object of type 'str'
# Format date runs fine on row 14
format_date(joss_pubs_df["published_date"][13])
The input value is: [[2022, 8, 15]]
Timestamp('2022-08-15 00:00:00')
How to apply functions to DataFrame values: .apply()
#
The .apply()
method allows you to apply any function to rows or columns in a pandas.DataFrame
. For example, you can use it to perform operations on specific column or row values. When you use .apply()
, you can specify whether you want to apply the function across columns (axis=0)
(the default) or across rows (axis=1)
.
For example, if you want to apply a function to each row of a DataFrame, you would use df.apply(your_function, axis=1)
. This function is especially useful for applying logic that can’t be easily achieved with built-in pandas functions, allowing for more flexibility in data processing.
You can use .apply
in pandas to efficiently replace for loops
to process row and column values in a pandas.DataFrame
.
# Apply the format_date function to every row in the published_date column
joss_pubs_df['published_date'].apply(format_date)
Tip
If you are using Jupyter, then you might find this page helpful when setting up debugging.
VSCODE has a nice visual debugger that you can use.
Important: It is ok if you can’t get the code to run fully by the end of this workshop. If you can:
identify at least one of the data processing “bugs” (even if you can’t fix it) and/or
fix at least one bug
You can consider your effort today as a success!
Activity 3, part 3#
Activity 3.3
Your goal in this activity is to generate a list of all package names
found in the example CrossRef data. Below is a clean_title
function
and a small workflow that parses through all titles in the sample data.
However, the function isn’t working as expected. Add checks to
the clean_title
function to ensure it correctly extracts the title of each
package in each publication.
joss_pubs_df["title"].head(15)
0 [bmiptools: BioMaterials Image Processing Tools]
1 [[QuasinormalModes.jl: A Julia package for com...
2 [CWInPy: A Python package for inference with c...
3 [Nempy: A Python package for modelling the Aus...
4 [Spectral Connectivity: a python package for c...
5 [SEEDPOD Ground Risk: A Python application and...
6 [DIANNA: Deep Insight And Neural Network Analy...
7 [[diman: A Clojure Package for Dimensional Ana...
8 [PERFORM: A Python package for developing redu...
9 [TLViz: Visualising and analysing tensor decom...
10 [ALUES: R package for Agricultural Land Use Ev...
11 [[Spiner: Performance Portable Routines for Ge...
12 [pyndl: Naïve Discriminative Learning in Python]
13 [HostPhot: global and local photometry of gala...
14 [QMKPy: A Python Testbed for the Quadratic Mul...
Name: title, dtype: object
def clean_title(title):
"""Get package name from a crossref title string.
Parameters
----------
title : str
The title string containing a package name followed by a colon and description.
Returns
-------
str
The package name before the colon.
"""
return title[0].split(":")
# Add checks to the clean_title function to make sure this code runs
all_titles = []
for a_title in joss_pubs_df["title"]:
all_titles.append(clean_title(a_title))
all_titles
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[13], line 4
2 all_titles = []
3 for a_title in joss_pubs_df["title"]:
----> 4 all_titles.append(clean_title(a_title))
5 all_titles
Cell In[12], line 16, in clean_title(title)
1 def clean_title(title):
2 """Get package name from a crossref title string.
3
4 Parameters
(...)
13
14 """
---> 16 return title[0].split(":")
AttributeError: 'list' object has no attribute 'split'
a = joss_pubs_df["title"][0]
a[0].split(":")
# joss_pubs_df["title"][0]
['bmiptools', ' BioMaterials Image Processing Tools']
# The title value in the first row of the df
print(joss_pubs_df["title"][0])
print(type(joss_pubs_df["title"][0]))
['bmiptools: BioMaterials Image Processing Tools']
<class 'list'>
# The title value unnested from the list
print(joss_pubs_df["title"][0][0])
print(type(joss_pubs_df["title"][0][0]))
bmiptools: BioMaterials Image Processing Tools
<class 'str'>
print(f"The value is {joss_pubs_df['title'][0]}")
clean_title(joss_pubs_df["title"][0])
The value is ['bmiptools: BioMaterials Image Processing Tools']
['bmiptools', ' BioMaterials Image Processing Tools']
clean_title(joss_pubs_df["title"][1])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[18], line 1
----> 1 clean_title(joss_pubs_df["title"][1])
Cell In[12], line 16, in clean_title(title)
1 def clean_title(title):
2 """Get package name from a crossref title string.
3
4 Parameters
(...)
13
14 """
---> 16 return title[0].split(":")
AttributeError: 'list' object has no attribute 'split'
On your own#
On Your Own 1
If you complete all the activities above, consider this challenge. Fix the workflow below so it runs. To do this, you can use the results of the functions you worked on above.
# Full code snippet
import json
from pathlib import Path
import pandas as pd
def load_clean_json(file_path, columns_to_keep):
"""
Load JSON data from a file. Drop unnecessary columns and normalize
to DataFrame.
Parameters
----------
file_path : Path
Path to the JSON file.
columns_to_keep : list
List of columns to keep in the DataFrame.
Returns
-------
dict
Loaded JSON data.
"""
with file_path.open("r") as json_file:
json_data = json.load(json_file)
normalized_data = pd.json_normalize(json_data)
return normalized_data.filter(items=columns_to_keep)
def format_date(date_parts: list) -> str:
"""
Format date parts into a string.
Parameters
----------
date_parts : list
List containing year, month, and day.
Returns
-------
pd.datetime
A date formatted as a `pd.datetime` object.
"""
date_str = f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}"
return pd.to_datetime(date_str, format="%Y-%m-%d")
def clean_title(value):
"""Removes a value contained in a list.
Parameters
----------
value : list
A list containing one or more elements.
Returns
-------
Any
The first element of the list `value`.
"""
print("hi", value)
return value[0]
columns_to_keep = [
"publisher",
"DOI",
"type",
"author",
"is-referenced-by-count",
"title",
"published.date-parts",
]
data_dir = Path("data")
all_papers_list = []
for json_file in sorted(data_dir.glob("*.json")):
print(json_file)
papers_df = load_clean_json(json_file, columns_to_keep)
papers_df["published_date"] = papers_df["published.date-parts"].apply(format_date)
papers_df["title"] = papers_df["title"].apply(clean_title)
all_papers_list.append(papers_df)
all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True)
print("Final shape of combined DataFrame:", all_papers_df.shape)
data/2022-01-joss-publications.json
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[19], line 84
82 print(json_file)
83 papers_df = load_clean_json(json_file, columns_to_keep)
---> 84 papers_df["published_date"] = papers_df["published.date-parts"].apply(format_date)
85 papers_df["title"] = papers_df["title"].apply(clean_title)
87 all_papers_list.append(papers_df)
File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
4789 def apply(
4790 self,
4791 func: AggFuncType,
(...)
4796 **kwargs,
4797 ) -> DataFrame | Series:
4798 """
4799 Invoke function on values of Series.
4800
(...)
4915 dtype: float64
4916 """
4917 return SeriesApply(
4918 self,
4919 func,
4920 convert_dtype=convert_dtype,
4921 by_row=by_row,
4922 args=args,
4923 kwargs=kwargs,
-> 4924 ).apply()
File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/apply.py:1427, in SeriesApply.apply(self)
1424 return self.apply_compat()
1426 # self.func is Callable
-> 1427 return self.apply_standard()
File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/apply.py:1507, in SeriesApply.apply_standard(self)
1501 # row-wise access
1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
1503 # we need to give `na_action="ignore"` for categorical data.
1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
1505 # Categorical (GH51645).
1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
1508 mapper=curried, na_action=action, convert=self.convert_dtype
1509 )
1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
1512 # GH#43986 Need to do list(mapped) in order to get treated as nested
1513 # See also GH#25959 regarding EA support
1514 return obj._constructor_expanddim(list(mapped), index=obj.index)
File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
918 if isinstance(arr, ExtensionArray):
919 return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File ~/work/lessons/lessons/.nox/docs-test/lib/python3.11/site-packages/pandas/core/algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
1741 values = arr.astype(object, copy=False)
1742 if na_action is None:
-> 1743 return lib.map_infer(values, mapper, convert=convert)
1744 else:
1745 return lib.map_infer_mask(
1746 values, mapper, mask=isna(values).view(np.uint8), convert=convert
1747 )
File lib.pyx:2972, in pandas._libs.lib.map_infer()
Cell In[19], line 47, in format_date(date_parts)
33 def format_date(date_parts: list) -> str:
34 """
35 Format date parts into a string.
36
(...)
45 A date formatted as a `pd.datetime` object.
46 """
---> 47 date_str = f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}"
48 return pd.to_datetime(date_str, format="%Y-%m-%d")
ValueError: Unknown format code 'd' for object of type 'str'