DateTime (de)Serialization Benchmarks from Python, Numpy, Chrono, and Time
Datetime parsing and rendering sometimes require optimization when iterating over a large dataset. Say you have about a couple million rows of timestamps you'd like to parse into a datatype. It could take some time if you use the wrong import
or crate
. In this article, I'll benchmark what it takes to load a couple million datetime stamps with Python's datetime, Numpy, Chrono, and Time
We'll explore different architectural considerations and design patterns to improve the ergonomics of using the different software libraries. Providing you with the ability to drop code into your project with minimal effort.
There are some considerations to apply when selecting the right datetime alteration software.
- Leap Seconds
- Leap Years
- US Daylight Savings Timezone Offsets
- Nanosecond Support
Where to apply these considerations is an architectural decision, for example storing all datetime strings in a database should be in UTC, would take care of Leap Seconds & Leap years and avoid having to manage US Daylight Timezone Offset. When rendering a datetime string on a webpage using technologies such as VueJS or ReactJS. We can leverage Typescript|JavaScript to transform those datetime strings from UTC into client-facing timezone aware objects. Client-facing browsers such as Firefox, Chrome, Edge, and Opera know where the user is situated based of their computer system clocks. Therefore, we can pass a UTC datetime string into new Date()
and it should render in the correct timezone with the correct offset. Furthermore, we can transform those same UTC datetime strings into Python or Rust and format the objects accordingly in the event we need to render a time-series
Python's datetime
Module
I find when working with the datetime
module, its fast enough to use in a JSON-WebAPI, but to slow when loading, rendering, or generating large amounts of data. Therefore, its very well suited in enterprise projects which would could include like Django, FastAPI, or Flask.
import typing
from datetime import datetime, timedelta, timezone
T = typing.TypeVar('T')
class DateTimeUTC:
FORMAT = '%Y-%m-%dT%H:%M:%S%z'
def __init__(self, current_datetime: str | datetime = None) -> None:
if isinstance(current_datetime, datetime):
self._datetime = current_datetime
elif isinstance(current_datetime, str):
self._datetime = datetime.strptime(current_datetime, self.FORMAT)
elif current_datetime is None:
self._datetime = datetime.now(timezone.utc)
else:
raise NotImplementedError(current_datetime.__class__)
def __str__(self) -> str:
return self._datetime.strftime(self.FORMAT)
def __repr__(self) -> str:
return f'DateTimeUTC: {self}'
def __eq__(self, other: T) -> bool:
return self._datetime.__eq__(other._datetime)
def __ne__(self, other: T) -> bool:
return self._datetime.__ne__(other._datetime)
def __lt__(self, other: T) -> bool:
return self._datetime.__lt__(other._datetime)
def __le__(self, other: T) -> bool:
return self._datetime.__le__(other._datetime)
def __gt__(self, other: T) -> bool:
return self._datetime.__gt__(other._datetime)
def __ge__(self, other: T) -> bool:
return self._datetime.__ge__(other._datetime)
def __add__(self, delta: timedelta | T) -> timedelta | T:
if isinstance(delta, timedelta):
return DateTimeUTC(self._datetime + delta)
elif isinstance(delta, self.__class__):
return self._datetime + delta._datetime
else:
raise NotImplementedError(delta.__class__)
def __sub__(self, delta: timedelta | T) -> timedelta | T:
if isinstance(delta, timedelta):
return DateTimeUTC(self._datetime + delta)
elif isinstance(delta, self.__class__):
return self._datetime - delta._datetime
else:
raise NotImplementedError(delta.__class__)
In the implementation below, there are a series of dunder methods implemented on the DateTimeUTC
object. Those methods allow for to transform, subtract, add time to an existing object using timedelta
. At the core is the formatting, implemented in a similar way the new Date()
object has been implemented in common browsers.
>>> future_timestamp = str(DateTimeUTC() + timedelta(hours=1))
>>> future_timestamp
'2024-10-02T06:12:10+0000'
The format %Y-%m-%dT%H:%M:%S%z
provides a datetime string that new Date()
can interpret without having to manage timezones in TypeScript.
new Date('2024-10-02T06:12:10+0000')
Okay, great. Now that we know how to use the datetime
module. Let's benchmark the performance of it
timestamps = [DateTimeUTC(datetime(year=1800, month=1, day=1)) + timedelta(seconds=idx) for idx in range(0, 2000000)]
results = []
for idx in range(0, 10):
start = DateTimeUTC()
timestamps_rendered = []
for timestamp in timestamps:
timestamps_rendered.append(str(timestamp))
stop = DateTimeUTC()
duration = stop - start
results.append(duration)
# Rendering timestamp results
[r.seconds for r in results]
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
Adding the timezone to the datetime
object increased the duration of each benchmark by almost a whole second.
results = []
for idx in range(0, 10):
print("Iteration: ", idx)
start = DateTimeUTC()
for timestamp in timestamps_rendered:
_ = DateTimeUTC(timestamp)
stop = DateTimeUTC()
duration = stop - start
results.append(duration)
# Loading timestamp results
[r.seconds for r in results]
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
As you can see, loading the timestamp is immensely slower. We can extrapolate meaning from these results. If our JSON-WebAPIs are receiving more than, say 500,000 requests per-second. Than this is probably an area we could improve the API performance. Its just a lot of CPU time spent rendering data. We could also consider assuming every datetime string is UTC and drop handling the timezone
entirely. I personally wouldn't do that; mostly because I think it is more pythonic to keep timezone
intact. Explicit is better than implicit
Numpy's np.datetime64
module
numpy
provides datetime64
which is timezone unaware. Meaning, it won't handle UTC, EST, IST, JPN, or GMT. Therefore we'll need to omit the timezone data. This probably improves the performance of the module and is much more ideal for loading data into a database, rending to flat files, or producing parquet files. Like we did with the DateTimeUTC
class, make a Timestamp
class to encapsulate logic for datetime alterations.
TS_FORMAT = '%Y-%m-%dT%H:%M:%S%z'
NP_TS_FORMAT = '%Y-%m-%dT%H:%M:%S.%f'
T = typing.TypeVar('T')
class DatetimeEncodeError(Exception):
pass
class TimestampPart(enum.Enum):
Year = 'Y'
Month = 'm'
Week = 'W'
Day = 'd'
Hour = 'H'
Minute = 'M'
Second = 'S'
Nanosecond = 'f'
class Timestamp(typing.NamedTuple):
value: np.datetime64
def __hash__(self) -> int:
return self.value.__hash__()
def __add__(self: T, delta: np.timedelta64) -> T:
if not isinstance(delta, np.timedelta64):
raise TypeError(f'Expected np.timedelta64, got {type(delta)}')
return self.__class__(operator.add(self.value, delta))
def __sub__(self: T, delta: np.timedelta64) -> T:
if not isinstance(delta, np.timedelta64):
raise TypeError(f'Expected np.timedelta64, got {type(delta)}')
return self.__class__(operator.sub(self.value, delta))
def __gt__(self, timestamp: T) -> bool:
if not isinstance(timestamp, self.__class__):
raise TypeError(f'Expected Timestamp, got {type(timestamp)}')
return operator.gt(self.value, timestamp.value)
def __ge__(self, timestamp: T) -> bool:
if not isinstance(timestamp, self.__class__):
raise TypeError(f'Expected Timestamp, got {type(timestamp)}')
return operator.ge(self.value, timestamp.value)
def __eq__(self, timestamp: T) -> bool:
if not isinstance(timestamp, self.__class__):
raise TypeError(f'Expected Timestamp, got {type(timestamp)}')
return operator.eq(self.value, timestamp.value)
def __le__(self, timestamp: T) -> bool:
if not isinstance(timestamp, self.__class__):
raise TypeError(f'Expected Timestamp, got {type(timestamp)}')
return operator.le(self.value, timestamp.value)
def __lt__(self, timestamp: T) -> bool:
if not isinstance(timestamp, self.__class__):
raise TypeError(f'Expected Timestamp, got {type(timestamp)}')
return operator.lt(self.value, timestamp.value)
def format(self: T, format: str = TS_FORMAT) -> str:
str(self.value)
@classmethod
def Parse(cls: T, value: str) -> T:
return cls(np.datetime64(value))
def replace(self: T, replace_part: TimestampPart, replace_value: int) -> T:
result = []
skip_one = False
for idx, char in enumerate(NP_TS_FORMAT):
if skip_one is True:
skip_one = False
continue
if char == '%':
code = NP_TS_FORMAT[idx + 1]
try:
part_type = [part for part in TimestampPart if part.value == code][0]
except IndexError:
raise DatetimeEncodeError(f'Unknown datetime part: {char}{code}')
else:
if part_type is TimestampPart.Year:
value = str(self.value.astype(f'datetime64[Y]'))
elif part_type is TimestampPart.Month:
value = str(self.value.astype(f'datetime64[M]')).rsplit('-', 1)[1]
elif part_type is TimestampPart.Day:
value = str(self.value.astype('datetime64[D]')).rsplit('-', 1)[1]
elif part_type is TimestampPart.Hour:
value = str(self.value.astype(f'datetime64[h]')).rsplit('T', 1)[1]
elif part_type is TimestampPart.Minute:
value = str(self.value.astype(f'datetime64[m]')).rsplit(':', 1)[1]
elif part_type is TimestampPart.Second:
value = str(self.value.astype(f'datetime64[s]')).rsplit(':', 1)[1]
elif part_type is TimestampPart.Nanosecond:
value = str(self.value.astype(f'datetime64[ns]')).rsplit('.', 1)[1]
else:
raise NotImplementedError(part_type)
if replace_part is part_type:
if part_type is TimestampPart.Nanosecond:
zero_diff = 9 - len(str(replace_value))
result.append(f'{"0" * zero_diff}{replace_value}')
elif replace_value < 10:
result.append(f'0{replace_value}')
else:
result.append(str(replace_value))
else:
result.append(value)
skip_one = True
else:
result.append(char)
return self.__class__(np.datetime64(''.join(result)))
As you can see, there is a lot going on. Most of the logic is used for datetime alterations and not serialization. We'll run the rendering benchmark with the same series of timestamps, but down to nanosecond support and without a timezone.
timestamps = [Timestamp(np.datetime64(datetime(year=1800, month=1, day=1) + timedelta(seconds=idx), 'ns')) for idx in range(0, 2000000)]
results = []
for idx in range(0, 10):
start = datetime.now(timezone.utc)
timestamps_rendered = []
for timestamp in timestamps:
timestamps_rendered.append(timestamp.format())
stop = datetime.now(timezone.utc)
duration = stop - start
results.append(duration)
>>> [r.seconds for r in results]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The results show sub-second serialization; lets find the mean.
sum([r.microseconds for r in results]) / 10
371,548.3
371548.3
microseconds on average, which comes to about 37.1548
milliseconds on average for each benchmark. Considerably faster than the datetime
module. Which is to be expected, numpy
is accelerated using c-code, and the API is in python to make data processing much more manageable. In fact it has been suggested to me in the past, that the for/loop might be whats slowing down this code and not the serialization routine numpy
is performing changing the datetime from datetime64
to a string.
Let's go ahead and run a benchmark for loading the datetime string.
results = []
for idx in range(0, 10):
start = datetime.now(timezone.utc)
for timestamp in timestamps_rendered:
_ = Timestamp.Parse(timestamp)
stop = datetime.now(timezone.utc)
duration = stop - start
results.append(duration)
>>> [r.seconds for r in results]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Loading the datetime string shows similar sub-second results. Let's again, find the mean.
>>> sum([r.microseconds for r in results]) / 10
774,868.9
Definitely a performance hit here when loading the datetime string into np.datetime64
. However, it is still a significant performance increase when its compared to Python's datetime
module.
Moving into Rust, one of the expectations is our code will run at faster speeds. We also have to consider not all Python is the same. Some Python packages ship with c-code to improve performance. For example, take a look at httptools, asyncpg, and uvloop. All developed by MagicStack. Yeah, I'm kind of a fan of the software. With that said, Rust can still be written in a way to run slower than Python.
Chrono
Chrono was the first crate I used to perform datetime alterations in Rust. As a result I have an affinity for chrono
more so than some of the other choices. When I design software in Rust, its primarily for machine-to-machine comms and less for Server/Client-Browser comms. With that said, I have rolled a couple Web Stacks and the builtin serialization implementing serde is used extensively throughout my code.
#[derive(Debug, Clone, serde::Deserialize, serde::Serialize)]
struct DatetimeString(chrono::DateTime<chrono::Utc>);
pub static TS_FORMAT: &str = "%Y-%m-%dT%H:%M:%S%.9f%z";
fn only_chrono() {
let mut timestamps = Vec::new();
for _ in 0..2_000_000 {
timestamps.push(chrono::Utc::now());
}
let mut results = Vec::new();
let mut rendered_timestamps = Vec::new();
for _ in 0..10 {
rendered_timestamps.clear();
println!("Iteration {:?}", chrono::Utc::now());
let start = chrono::Utc::now();
for timestamp in timestamps.clone() {
let result = timestamp.format(TS_FORMAT).to_string();
rendered_timestamps.push(result);
}
let stop = chrono::Utc::now();
let duration = stop - start;
results.push(duration);
}
println!("Serialization Duration: {:?}", results);
let mut results = Vec::new();
for _ in 0..10 {
println!("Iteration {:?}", chrono::Utc::now());
let start = chrono::Utc::now();
for timestamp in rendered_timestamps.clone() {
let _ = chrono::DateTime::parse_from_str(timestamp.as_str(), TS_FORMAT).expect("Failed");
}
let stop = chrono::Utc::now();
let duration = stop - start;
results.push(duration);
}
println!("Deserialization Duration: {:?}", results);
}
fn with_serde_and_chrono() {
let mut timestamps = Vec::new();
for _ in 0..2_000_000 {
timestamps.push(DatetimeString(chrono::Utc::now()));
}
let mut results = Vec::new();
let mut rendered_timestamps = Vec::new();
for _ in 0..10 {
rendered_timestamps.clear();
println!("Iteration {:?}", chrono::Utc::now());
let start = chrono::Utc::now();
for timestamp in timestamps.clone() {
let result = serde_json::to_string(×tamp).expect("Failed");
rendered_timestamps.push(result);
}
let stop = chrono::Utc::now();
let duration = stop - start;
results.push(duration);
}
println!("Serialization Duration: {:?}", results);
let mut results = Vec::new();
for _ in 0..10 {
println!("Iteration {:?}", chrono::Utc::now());
let start = chrono::Utc::now();
for timestamp in rendered_timestamps.clone() {
let _: DatetimeString = serde_json::from_str(×tamp.as_str()).expect("Failed");
}
let stop = chrono::Utc::now();
let duration = stop - start;
results.push(duration);
}
println!("Deserialization Duration: {:?}", results);
}
For each iteration of 2 million timestamps, which generates considerably faster than Python's datetime
module. DateTime<UTC>
is formatted into a datetime string with nano second precision. I presume the timezone alterations will account for some slowdown, but I won't test for that because if the software can support timezones, those should be included in the benchmark so that the maximum amount of information indicates the correct coordinate it time, regardless of where you are on Earth.
The Deserialization will take TS_FORMAT
and rebuild the datetime strings into Rust types of DateTime<UTC>
.
Serialization Duration: [
TimeDelta { secs: 8, nanos: 548368000 },
TimeDelta { secs: 8, nanos: 529499000 },
TimeDelta { secs: 8, nanos: 547975000 },
TimeDelta { secs: 8, nanos: 532770000 },
TimeDelta { secs: 8, nanos: 534317000 },
TimeDelta { secs: 8, nanos: 576425000 },
TimeDelta { secs: 8, nanos: 519606000 },
TimeDelta { secs: 8, nanos: 546453000 },
TimeDelta { secs: 8, nanos: 535094000 },
TimeDelta { secs: 8, nanos: 622479000 }]
Deserialization Duration: [
TimeDelta { secs: 11, nanos: 785758000 },
TimeDelta { secs: 11, nanos: 749219000 },
TimeDelta { secs: 11, nanos: 946002000 },
TimeDelta { secs: 11, nanos: 846905000 },
TimeDelta { secs: 11, nanos: 558516000 },
TimeDelta { secs: 11, nanos: 828807000 },
TimeDelta { secs: 11, nanos: 566514000 },
TimeDelta { secs: 11, nanos: 669675000 },
TimeDelta { secs: 11, nanos: 751291000 },
TimeDelta { secs: 11, nanos: 431745000 }]
I was staggered by the results. chrono
is running multiple orders of magnitude slower than Python's datetime
module and Numpy's datetime64
module.
A second pair of Serialization / Deserialization benchmarks to see if there is a performance difference using serde
.
Serialization Duration: [
TimeDelta { secs: 6, nanos: 741766000 },
TimeDelta { secs: 6, nanos: 739703000 },
TimeDelta { secs: 6, nanos: 752363000 },
TimeDelta { secs: 6, nanos: 757359000 },
TimeDelta { secs: 6, nanos: 783687000 },
TimeDelta { secs: 6, nanos: 790181000 },
TimeDelta { secs: 6, nanos: 739028000 },
TimeDelta { secs: 6, nanos: 740279000 },
TimeDelta { secs: 6, nanos: 735011000 },
TimeDelta { secs: 6, nanos: 733734000 }]
Deserialization Duration: [
TimeDelta { secs: 7, nanos: 839185000 },
TimeDelta { secs: 7, nanos: 825194000 },
TimeDelta { secs: 7, nanos: 813160000 },
TimeDelta { secs: 7, nanos: 817082000 },
TimeDelta { secs: 7, nanos: 823954000 },
TimeDelta { secs: 7, nanos: 814741000 },
TimeDelta { secs: 7, nanos: 817677000 },
TimeDelta { secs: 7, nanos: 857673000 },
TimeDelta { secs: 7, nanos: 819424000 },
TimeDelta { secs: 7, nanos: 818405000 }]
An obvious performance increase using serde
. I didn't expect these results, and I hope someone more familiar with the software can point out what why serde
can serialize and deserialize DateTime<UTC>
quicker than using .format
.
Time
I haven't used time
much. and it seems to be used reliably with a number of dependencies I've started pulling from. Therefore it has been pushed into my radar and I'd like to determine if the software is more performant than chrono
.
The benchmark will be setup similar to Chrono benchmarks. We'll test formatting OffsetDateTime
into a datetime string and back. Then I'll implement serde
and see if there is a performance gain or loss.
#[derive(Debug, Clone, serde::Deserialize, serde::Serialize)]
struct TimeDateTimeString(#[serde(with = "time::serde::rfc3339")] time::OffsetDateTime);
fn only_time() {
let mut timestamps = Vec::new();
for _ in 0..2_000_000 {
timestamps.push(time::OffsetDateTime::now_utc());
}
let mut results = Vec::new();
let mut rendered_timestamps = Vec::new();
let ts_format = "[year]-[month]-[day]T[hour repr:24 padding:none]:[minute]:[second].[subsecond digits:9][offset_hour sign:mandatory]";
let ts_format = format_description::parse(ts_format).expect("Unable to set formatter");
for _ in 0..10 {
rendered_timestamps.clear();
println!("Iteration {:?}", time::OffsetDateTime::now_utc());
let start = time::OffsetDateTime::now_utc();
for timestamp in timestamps.clone() {
rendered_timestamps.push(timestamp.format(&ts_format).unwrap());
}
let stop = time::OffsetDateTime::now_utc();
let duration = stop - start;
results.push(duration);
}
println!("Serialization Results: {:?}", results);
let mut results = Vec::new();
for _ in 0..10 {
println!("Iteration: {:?}", time::OffsetDateTime::now_utc());
let start = time::OffsetDateTime::now_utc();
for timestamp in rendered_timestamps.clone() {
let _ = time::OffsetDateTime::parse(×tamp, &ts_format).expect("Failed");
}
let stop = time::OffsetDateTime::now_utc();
let duration = stop - start;
results.push(duration);
}
println!("Deserialization Duration: {:?}", results);
}
fn with_serde_and_time() {
let mut timestamps = Vec::new();
for _ in 0..2_000_000 {
timestamps.push(TimeDateTimeString(time::OffsetDateTime::now_utc()));
}
let mut results = Vec::new();
let mut rendered_timestamps = Vec::new();
for _ in 0..10 {
rendered_timestamps.clear();
println!("Iteration {:?}", time::OffsetDateTime::now_utc());
let start = time::OffsetDateTime::now_utc();
for timestamp in timestamps.clone() {
let result = serde_json::to_string(×tamp).expect("Failed");
rendered_timestamps.push(result);
}
let stop = time::OffsetDateTime::now_utc();
let duration = stop - start;
results.push(duration);
}
println!("Serialization Duration: {:?}", results);
let mut results = Vec::new();
for _ in 0..10 {
println!("Iteration {:?}", time::OffsetDateTime::now_utc());
let start = time::OffsetDateTime::now_utc();
for timestamp in rendered_timestamps.clone() {
let _: DatetimeString = serde_json::from_str(×tamp.as_str()).expect("Failed");
}
let stop = time::OffsetDateTime::now_utc();
let duration = stop - start;
results.push(duration);
}
println!("Deserialization Duration: {:?}", results);
}
time::OffsetDateTime
API took me some time to get used to, but it seems to be more idiomatic than chrono
, datetime
, and datetime64
. Which I think is great, but at the same time it makes it hard to find the correct syntax for say, the ts_format
variable. I have specified that I'd like to have nanosecond precision, but I have no idea how to set that level of precision using OffsetDateTime::now_utc()
Serialization Results: [
Duration { seconds: 4, nanoseconds: 939048000 },
Duration { seconds: 4, nanoseconds: 945770000 },
Duration { seconds: 4, nanoseconds: 956920000 },
Duration { seconds: 4, nanoseconds: 945419000 },
Duration { seconds: 4, nanoseconds: 949541000 },
Duration { seconds: 4, nanoseconds: 947545000 },
Duration { seconds: 4, nanoseconds: 951264000 },
Duration { seconds: 4, nanoseconds: 975607000 },
Duration { seconds: 4, nanoseconds: 949841000 },
Duration { seconds: 4, nanoseconds: 941445000 }]
Deserialization Results: [
Duration { seconds: 7, nanoseconds: 333048000 },
Duration { seconds: 7, nanoseconds: 339670000 },
Duration { seconds: 7, nanoseconds: 331340000 },
Duration { seconds: 7, nanoseconds: 323940000 },
Duration { seconds: 7, nanoseconds: 321096000 },
Duration { seconds: 7, nanoseconds: 327553000 },
Duration { seconds: 7, nanoseconds: 349085000 },
Duration { seconds: 7, nanoseconds: 314581000 },
Duration { seconds: 7, nanoseconds: 338267000 },
Duration { seconds: 7, nanoseconds: 324325000 }]
Way better results than chrono
datetime string formatting. Fairly close to Python's datetime
module's ability to serialize datetime
objects. An obvious performance improvement, but still on par with datetime
and behind datetime64
A second pair of Serialization / Deserialization benchmarks to see if there is a performance difference using serde
.
Serialization Duration: [
Duration { seconds: 5, nanoseconds: 285355000 },
Duration { seconds: 5, nanoseconds: 363159000 },
Duration { seconds: 5, nanoseconds: 300327000 },
Duration { seconds: 5, nanoseconds: 277747000 },
Duration { seconds: 5, nanoseconds: 275644000 },
Duration { seconds: 5, nanoseconds: 279714000 },
Duration { seconds: 5, nanoseconds: 273839000 },
Duration { seconds: 5, nanoseconds: 317712000 },
Duration { seconds: 5, nanoseconds: 297983000 },
Duration { seconds: 5, nanoseconds: 285739000 }]
Deserialization Duration: [
Duration { seconds: 7, nanoseconds: 849486000 },
Duration { seconds: 7, nanoseconds: 826524000 },
Duration { seconds: 7, nanoseconds: 824564000 },
Duration { seconds: 7, nanoseconds: 850677000 },
Duration { seconds: 7, nanoseconds: 822075000 },
Duration { seconds: 7, nanoseconds: 831446000 },
Duration { seconds: 7, nanoseconds: 827239000 },
Duration { seconds: 7, nanoseconds: 826451000 },
Duration { seconds: 7, nanoseconds: 832332000 },
Duration { seconds: 7, nanoseconds: 906543000 }]
Only a performance hit with serialization information and a very minor performance hit when deserialization. Very impressive.
Comparative Results
Module / Software | Average Serialization Duration | Average Deserialization Duration |
---|---|---|
Python's datetime | 3 seconds | 10 seconds |
Numpy's datetime64 | 371,548 ms | 774,868.9 ms |
Chrono | 8 seconds | 11 seconds |
Chrono & Serde | 6 seconds | 7 seconds |
Time | 4 seconds | 7 seconds |
Time & Serde | 5 seconds | 7 seconds |
The comparative results have me wondering if there is a correlation between the datetime64
and time
results. Obviously, if you're going to load large amounts of data. Numpy's datetime64
datetime strings are the way to go for now. I wonder how well datetime64
would perform in a Spark runtime.