Caching

Inevitably, there will be occasions during applications development or deployment when some task is revealed to be taking a significant amount of time to complete. When this occurs, the best way to speed things up is with caching.

Pylons comes with caching middleware enabled that is part of the same package that provides the session handling, Beaker. Beaker supports a variety of caching backends: memory-based, filesystem-based and the specialised memcached library.

There are several ways to cache data under Pylons, depending on where the slowdown is occurring:

  • Browser-side Caching - HTTP/1.1 supports the ETag caching system that allows the browser to use its own cache instead of requiring regeneration of the entire page. ETag-based caching avoids repeated generation of content but if the browser has never seen the page before, the page will still be generated. Therefore using ETag caching in conjunction with one of the other types of caching listed here will achieve optimal throughput and avoid unnecessary calls on resource-intensive operations.

Note

the latter only helps if the entire page can be cached.

  • Controllers - The cache object can be imported in controllers used for caching anything in Python that can be pickled.
  • Templates - The results of an entire rendered template can be cached using the 3 cache keyword arguments to the render calls. These render commands can also be used inside templates.
  • Mako/Myghty Templates - Built-in caching options are available for both Mako and Myghty template engines. They allow fine-grained caching of only certain sections of the template as well as caching of the entire template.

The two primary concepts to bear in mind when caching are i) caches have a namespace and ii) caches can have keys under that namespace. The reason for this is that, for a single template, there might be multiple versions of the template each requiring its own cached version. The keys in the namespace are the version and the name of the template is the namespace. Both of these values must be Python strings.

In templates, the cache namespace will automatically be set to the name of the template being rendered. Nothing else is required for basic caching, unless the developer wishes to control for how long the template is cached and/or maintain caches of multiple versions of the template.

See also

Stephen Pierzchala’s Caching for Performance (stephen@pierzchala.com)

Using the Cache object

Inside the controller, the cache object needs to be imported before being used. If an action or block of code makes heavy use of resources or take a long time to complete, it can be convenient to cache the result. The cache object can cache any Python structure that can be pickled.

Consider an action where it is desirable to cache some code that does a time-consuming or resource-intensive lookup and returns an object that can be pickled (list, dict, tuple, etc.):

# Add to existing imports
from pylons import cache


# Under the controller class
def some_action(self, day):
    # hypothetical action that uses a 'day' variable as its key

    def expensive_function():
        # do something that takes a lot of cpu/resources
        return expensive_call()

    # Get a cache for a specific namespace, you can name it whatever
    # you want, in this case its 'my_function'
    mycache = cache.get_cache('my_function', type="memory")

    # Get the value, this will create the cache copy the first time
    # and any time it expires (in seconds, so 3600 = one hour)
    c.myvalue = mycache.get_value(key=day, createfunc=expensive_function,
                                  expiretime=3600)

    return render('/some/template.myt')

The createfunc option requires a callable object or a function which is then called by the cache whenever a value for the provided key is not in the cache, or has expired in the cache.

Because the createfunc is called with no arguments, the resource- or time-expensive function must correspondingly also not require any arguments.

Other Cache Options

The cache also supports the removal values from the cache, using the key(s) to identify the value(s) to be removed and it also supports clearing the cache completely, should it need to be reset.

# Clear the cache
mycache.clear()

# Remove a specific key
mycache.remove_value('some_key')

Using Cache keywords to render

Warning

Needs to be extended to cover the specific render_* calls introduced in Pylons 0.9.7

All render <pylons.templating.render_mako() commmands have caching functionality built in. To use it, merely add the appropriate cache keyword to the render call.

class SampleController(BaseController):

    def index(self):
        # Cache the template for 10 mins
        return render('/index.myt', cache_expire=600)

    def show(self, id):
        # Cache this version of the template for 3 mins
        return render('/show.myt', cache_key=id, cache_expire=180)

    def feed(self):
        # Cache for 20 mins to memory
        return render('/feed.myt', cache_type='memory', cache_expire=1200)

    def home(self, user):
        # Cache this version of a page forever (until the cache dir
        # is cleaned)
        return render('/home.myt', cache_key=user, cache_expire='never')

Using the Cache Decorator

Pylons also provides the beaker_cache() decorator for caching in pylons.cache the results of a completed function call (memoizing).

The cache decorator takes the same cache arguments (minus their cache_ prefix), as the render function does.

from pylons.decorators.cache import beaker_cache

class SampleController(BaseController):

    # Cache this controller action forever (until the cache dir is
    # cleaned)
    @beaker_cache()
    def home(self):
        c.data = expensive_call()
        return render('/home.myt')

    # Cache this controller action by its GET args for 10 mins to memory
    @beaker_cache(expire=600, type='memory', query_args=True)
    def show(self, id):
        c.data = expensive_call(id)
        return render('/show.myt')

By default the decorator uses a composite of all of the decorated function’s arguments as the cache key. It can alternatively use a composite of the request.GET query args as the cache key when the query_args option is enabled.

The cache key can be further customized via the key argument.

Caching Arbitrary Functions

Arbitrary functions can use the beaker_cache() decorator, but should include an additional option. Since the decorator caches the response object, its unlikely the status code and headers for non-controller methods should be cached. To avoid caching that data, the cache_response keyword argument should be set to false.

from pylons.decorators.cache import beaker_cache

@beaker_cache(expire=600, cache_response=False)
def generate_data():
    # do expensive data generation
    return data

Warning

When caching arbitrary functions, the query_args argument should not be used since the result of arbitrary functions shouldn’t depend on the request parameters.

ETag Caching

Caching via ETag involves sending the browser an ETag header so that it knows to save and possibly use a cached copy of the page from its own cache, instead of requesting the application to send a fresh copy.

Because the ETag cache relies on sending headers to the browser, it works in a slightly different manner to the other caching mechanisms described above.

The etag_cache() function will set the proper HTTP headers if the browser doesn’t yet have a copy of the page. Otherwise, a 304 HTTP Exception will be thrown that is then caught by Paste middleware and turned into a proper 304 response to the browser. This will cause the browser to use its own locally-cached copy.

etag_cache() returns Response for legacy purposes (Response should be used directly instead).

ETag-based caching requires a single key which is sent in the ETag HTTP header back to the browser. The RFC specification for HTTP headers indicates that an ETag header merely needs to be a string. This value of this string does not need to be unique for every URL as the browser itself determines whether to use its own copy, this decision is based on the URL and the ETag key.

def my_action(self):
    etag_cache('somekey')
    return render('/show.myt', cache_expire=3600)

Or to change other aspects of the response:

def my_action(self):
    etag_cache('somekey')
    response.headers['content-type'] = 'text/plain'
    return render('/show.myt', cache_expire=3600)

Note

In this example that we are using template caching in addition to ETag caching. If a new visitor comes to the site, we avoid re-rendering the template if a cached copy exists and repeat hits to the page by that user will then trigger the ETag cache. This example also will never change the ETag key, so the browsers cache will always be used if it has one.

The frequency with which an ETag cache key is changed will depend on the web application and the developer’s assessment of how often the browser should be prompted to fetch a fresh copy of the page.

Warning

Stolen from Philip Cooper’s OpenVest wiki after which it was updated and edited ...

Inside the Beaker Cache

Caching

First lets start out with some slow function that we would like to cache. This function is not slow but it will show us when it was cached so we can see things are working as we expect:

import time
def slooow(myarg):
  # some slow database or template stuff here
  return "%s at %s" % (myarg,time.asctime())

When we have the cached function, multiple calls will tell us whether are seeing a cached or a new version.

DBMCache

The DBMCache stores (actually pickles) the response in a dbm style database.

What may not be obvious is that the are two levels of keys. They are essentially created as one for the function or template name (called the namespace) and one for the ‘’keys’’ within that (called the key). So for Some_Function_name, there is a cache created as one dbm file/database. As that function is called with different arguments, those arguments are keys within the dbm file. First lets create and populate a cache. This cache might be a cache for the function Some_Function_name called three times with three different arguments: x, yy, and zzz:

from beaker.cache import CacheManager
cm = CacheManager(type='dbm', data_dir='beaker.cache')
cache = cm.get_cache('Some_Function_name')
# the cache is setup but the dbm file is not created until needed
# so let's populate it with three values:
cache.get_value('x', createfunc=lambda: slooow('x'), expiretime=15)
cache.get_value('yy', createfunc=lambda: slooow('yy'), expiretime=15)
cache.get_value('zzz', createfunc=lambda: slooow('zzz'), expiretime=15)

Nothing much new yet. After getting the cache we can use the cache as per the Beaker Documentation.

import beaker.container as container
cc = container.ContainerContext()
nsm = cc.get_namespace_manager('Some_Function_name',
                               container.DBMContainer,data_dir='beaker.cache')
filename = nsm.file

Now we have the file name. The file name is a sha hash of a string which is a join of the container class name and the function name (used in the get_cache function call). It would return something like:

'beaker.cache/container_dbm/a/a7/a768f120e39d0248d3d2f23d15ee0a20be5226de.dbm'

With that file name you could look directly inside the cache database (but only for your education and debugging experience, not your cache interactions!)

## this file name can be used directly (for debug ONLY)
import anydbm
import pickle
db = anydbm.open(filename)
old_t, old_v = pickle.loads(db['zzz'])

The database only contains the old time and old value. Where did the expire time and the function to create/update the value go?. They never make it to the database. They reside in the cache object returned from get_cache call above.

Note that the createfunc, and expiretime values are stored during the first call to get_value. Subsequent calls with (say) a different expiry time will not update that value. This is a tricky part of the caching but perhaps is a good thing since different processes may have different policies in effect.

If there are difficulties with these values, remember that one call to cache.clear() resets everything.

Database Cache

Using the ext:database cache type.

from beaker.cache import CacheManager
#cm = CacheManager(type='dbm', data_dir='beaker.cache')
cm = CacheManager(type='ext:database',
                  url="sqlite:///beaker.cache/beaker.sqlite",
                  data_dir='beaker.cache')
cache = cm.get_cache('Some_Function_name')
# the cache is setup but the dbm file is not created until needed
# so let's populate it with three values:
cache.get_value('x', createfunc=lambda: slooow('x'), expiretime=15)
cache.get_value('yy', createfunc=lambda: slooow('yy'), expiretime=15)
cache.get_value('zzz', createfunc=lambda: slooow('zzz'), expiretime=15)

This is identical to the cache usage above with the only difference being the creation of the CacheManager. It is much easier to view the caches outside the beaker code (again for edification and debugging, not for api usage).

SQLite was used in this instance and the SQLite data file can be directly accessed uaing the SQLite command-line utility or the Firefox plug-in:

sqlite3 beaker.cache/beaker.sqlite
# from inside sqlite:
sqlite> .schema
CREATE TABLE beaker_cache (
        id INTEGER NOT NULL,
        namespace VARCHAR(255) NOT NULL,
        key VARCHAR(255) NOT NULL,
        value BLOB NOT NULL,
        PRIMARY KEY (id),
         UNIQUE (namespace, key)
);
select * from beaker_cache;

Warning

The data structure is different in Beaker 0.8 ...

cache = sa.Table(table_name, meta,
                 sa.Column('id', types.Integer, primary_key=True),
                 sa.Column('namespace', types.String(255), nullable=False),
                 sa.Column('accessed', types.DateTime, nullable=False),
                 sa.Column('created', types.DateTime, nullable=False),
                 sa.Column('data', types.BLOB(), nullable=False),
                 sa.UniqueConstraint('namespace')
)

It includes the access time but stores rows on a one-row-per-namespace basis, (storing a pickled dict) rather than one-row-per-namespace/key-combination. This is a more efficient approach when the problem is handling a large number of namespaces with limited keys — like sessions.

Memcached Cache

For large numbers of keys with expensive pre-key lookups memcached it the way to go.

If memcached is running on the the default port of 11211:

from beaker.cache import CacheManager
cm = CacheManager(type='ext:memcached', url='127.0.0.1:11211',
                  lock_dir='beaker.cache')
cache = cm.get_cache('Some_Function_name')
# the cache is setup but the dbm file is not created until needed
# so let's populate it with three values:
cache.get_value('x', createfunc=lambda: slooow('x'), expiretime=15)
cache.get_value('yy', createfunc=lambda: slooow('yy'), expiretime=15)
cache.get_value('zzz', createfunc=lambda: slooow('zzz'), expiretime=15)