Python logarithm speed

A few weeks ago a friend of mine brought up something to do with logarithms in a group chat. One thing led to another, and out of curiosity I timed a couple of Python’s built-in log functions:

>>> timeit.timeit('[math.log10(rand) for rand in r]',
... setup='import math;import random;r = [random.random() for _ in range(10000000)]',
... number=1)
2.0042254191357642
>>> timeit.timeit('[math.log(rand) for rand in r]',
... setup='import math;import random;r = [random.random() for _ in range(10000000)]', 
... number=1)
2.345342932967469

Surprisingly log base 10 is 14.5% faster than natural log. Why is this? I did a quick Google for ‘python log speed’ and got a bunch of unrelated articles on logging, so decided to take a look1.

As some background, the majority of Python installations are backed by an interpreter called CPython2. This shouldn’t be confused with Cython, which is a C-like extension that facilitates more performant code; CPython is the C that basically makes everything written in Python happen. This is where we’ll be digging for answers.

There are a couple of hoops to jump through to get to the code that actually calculates the logs. We start with the math module definition in cmathmodule.c. Module definitions are the entry point into CPython for each of their methods – each lives as a #define containing its name in Python (e.g. log, log10, tanh) and a pointer to the C function implementing it. Both of these are scattered throughout cmathmodule.c.h3, although most of the logic in this file is error handling around other functions back in cmathmodule.c which do the heavy lifting. Those are the ones we’re interested in, and I’ve reproduced the code for log below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
static PyObject *
cmath_log_impl(PyObject *module, Py_complex x, PyObject *y_obj)
{
   Py_complex y;

   errno = 0;
   x = c_log(x);
   if (y_obj != NULL) {
       y = PyComplex_AsCComplex(y_obj);
       if (PyErr_Occurred()) {
           return NULL;
       }
       y = c_log(y);
       x = _Py_c_quot(x, y);
   }
   if (errno != 0)
       return math_error();
   return PyComplex_FromCComplex(x);
}

We can form a hypothesis by reading through this code. Even though c_log (line 7) returns the natural logarithm, this code also handles the general case of calculating the log in any base. As such, even if we’re just looking to calculate \(ln\) the conditional on line 8 must be checked, slowing things down regardless of whether it triggers.

This setup allows the Python math.log function to optionally take a base. If one is given then c_log is called again to do a base conversion (lines 13-14). We can see what this does to run time:

>>> timeit.timeit('[math.log(rand, 5) for rand in r]',
... setup='import math;import random;r = [random.random() for _ in range(10000000)]',
... number=1)
2.868814719840884

As expected, we get an increase. With this all established, how does log10 manage to be faster? Let’s take a look:

1
2
3
4
5
6
7
8
9
10
11
12
13
static Py_complex
cmath_log10_impl(PyObject *module, Py_complex z)
{
   Py_complex r;
   int errno_save;

   r = c_log(z);
   errno_save = errno;
   r.real = r.real / M_LN10;
   r.imag = r.imag / M_LN10;
   errno = errno_save;
   return r;
}

This code still uses c_log, but does the base conversion using a constant value of M_LN10 = \(log_e(10)\). It’s not great to come to conclusions on performance without profiling4, but it looks like doing away with the conditional and extra c_log call causes the difference.

What’s the lesson here? If performance really matters to you, then using math.log10 and doing the base conversion yourself will be fastest: the log base 5 example took 2.20 seconds (23% faster) when calculated as math.log10(rand) / log_10_5 rather than math.log(rand, 5). Surprisingly, using NumPy’s log function as a drop-in replacement took 10.81 seconds, but the real lesson is that if you really need the log of 10 million numbers at once then vectorisation is your friend:

>>> timeit.timeit('numpy.log(r)', 
... setup='import numpy;import random;r = [random.random() for _ in range(10000000)]',
... number=1)
0.6032462348230183
  1. It didn’t occur to me until just now that ‘logarithm’ is a way better search term and gives some relevant results, but here we are. 

  2. An alternative interpreter is PyPy, which is often faster than CPython at the expense of some compatibility issues with packages (e.g. pandas, scikit-learn, scipy, matplotlib - see here). PyPy is not to be confused with PyPI, a package management system for Python, in turn not to be confused with conda, another package management system. Some confusion is permitted. 

  3. Here for log and here for log10

  4. I had a look for how to profile CPython, but didn’t find anything obvious other than perf

Hello world

It took 2050 days for me to put something substantial on this domain, but finally here we are. I’ve considered putting something up multiple times since the bare Wordpress installation that used to live here died, and have a few notebook pages worth of ideas sketched out to varying degrees of completeness (skewed significantly to one end of that spectrum – no guesses which!). At some point while thinking about what an interactive mind map of articles (organised hierarchically by subjects/topics, obviously) might look like (this?) I realised just how much time I was wasting. There’s a toss up between how ‘unique’ I wanted the site to be and how large the pile of potential project write ups was becoming, and as time went on the latter began to outweigh the former.

Hence, this. I knew about static site generators while dreaming up those ideas, but was hesitant to add (what I perceived to be) complexity to what should just be a simple site with some blog posts. As my mind map idea might suggest my imagination was actually the main source of complexity, and it turns out that I’d also significantly underestimated how simple Jekyll was to set up and how many templates were available. Having gone through the process of choosing a template and adapting it I’m convinced I made the right choice – agonising over the design of an entire site to the same extent that I did over the type and colour of the social media glyphs would result in it never being completed.

Generators like this also lower the mental barrier to writing a new post even further. Make a new markdown file, commit & push and you’re golden. Pagination? That’s a plugin. RSS? Not that it’ll likely get used, but at least it’s there for free. Tagging? Another plugin, and almost certainly not something I’d end up implementing myself. One of my remaining worries was something along the lines of “but I might want to have some kind of dynamic project hosted on the site in the future”, which whilst possibly true sounds suspiciously like an excuse to put things off. Let’s cross that bridge when we come to it.

I’ll round this off by mentioning Steve Yegge’s post on blogging. It’s well worth a read.