.. _caching: ======= Caching ======= Inevitably, there will be occasions during applications development or deployment when some task is revealed to be taking a significant amount of time to complete. When this occurs, the best way to speed things up is with :term:`caching`. Pylons comes with caching middleware enabled that is part of the same package that provides the session handling, `Beaker `_. Beaker supports a variety of caching backends: memory-based, filesystem-based and the specialised `memcached` library. There are several ways to cache data under Pylons, depending on where the slowdown is occurring: * Browser-side Caching - HTTP/1.1 supports the :term:`ETag` caching system that allows the browser to use its own cache instead of requiring regeneration of the entire page. ETag-based caching avoids repeated generation of content but if the browser has never seen the page before, the page will still be generated. Therefore using ETag caching in conjunction with one of the other types of caching listed here will achieve optimal throughput and avoid unnecessary calls on resource-intensive operations. .. note:: the latter only helps if the entire page can be cached. * Controllers - The `cache` object can be imported in controllers used for caching anything in Python that can be pickled. * Templates - The results of an entire rendered template can be cached using the `3 cache keyword arguments to the render calls `_. These render commands can also be used inside templates. * Mako/Myghty Templates - Built-in caching options are available for both `Mako `_ and `Myghty `_ template engines. They allow fine-grained caching of only certain sections of the template as well as caching of the entire template. The two primary concepts to bear in mind when caching are i) caches have a *namespace* and ii) caches can have *keys* under that namespace. The reason for this is that, for a single template, there might be multiple versions of the template each requiring its own cached version. The keys in the namespace are the ``version`` and the name of the template is the ``namespace``. **Both of these values must be Python strings.** In templates, the cache ``namespace`` will automatically be set to the name of the template being rendered. Nothing else is required for basic caching, unless the developer wishes to control for how long the template is cached and/or maintain caches of multiple versions of the template. .. seealso:: Stephen Pierzchala's `Caching for Performance `_ (stephen@pierzchala.com) Using the Cache object ---------------------- Inside the controller, the `cache` object needs to be imported before being used. If an action or block of code makes heavy use of resources or take a long time to complete, it can be convenient to cache the result. The `cache` object can cache any Python structure that can be `pickled `_. Consider an action where it is desirable to cache some code that does a time-consuming or resource-intensive lookup and returns an object that can be pickled (list, dict, tuple, etc.): .. code-block:: python # Add to existing imports from pylons import cache # Under the controller class def some_action(self, day): # hypothetical action that uses a 'day' variable as its key def expensive_function(): # do something that takes a lot of cpu/resources return expensive_call() # Get a cache for a specific namespace, you can name it whatever # you want, in this case its 'my_function' mycache = cache.get_cache('my_function', type="memory") # Get the value, this will create the cache copy the first time # and any time it expires (in seconds, so 3600 = one hour) c.myvalue = mycache.get_value(key=day, createfunc=expensive_function, expiretime=3600) return render('/some/template.myt') The `createfunc` option requires a callable object or a function which is then called by the cache whenever a value for the provided key is not in the cache, or has expired in the cache. Because the `createfunc` is called with no arguments, the resource- or time-expensive function must correspondingly also not require any arguments. Other Cache Options ^^^^^^^^^^^^^^^^^^^ The cache also supports the removal values from the cache, using the key(s) to identify the value(s) to be removed and it also supports clearing the cache completely, should it need to be reset. .. code-block:: python # Clear the cache mycache.clear() # Remove a specific key mycache.remove_value('some_key') Using Cache keywords to `render` -------------------------------- .. warning:: Needs to be extended to cover the specific render_* calls introduced in Pylons 0.9.7 All :func:`render `_ indicates that an ETag header merely needs to be a string. This value of this string does not need to be unique for every URL as the browser itself determines whether to use its own copy, this decision is based on the URL and the ETag key. .. code-block:: python def my_action(self): etag_cache('somekey') return render('/show.myt', cache_expire=3600) Or to change other aspects of the response: .. code-block:: python def my_action(self): etag_cache('somekey') response.headers['content-type'] = 'text/plain' return render('/show.myt', cache_expire=3600) .. note:: In this example that we are using template caching in addition to ETag caching. If a new visitor comes to the site, we avoid re-rendering the template if a cached copy exists and repeat hits to the page by that user will then trigger the ETag cache. This example also will never change the ETag key, so the browsers cache will always be used if it has one. The frequency with which an ETag cache key is changed will depend on the web application and the developer's assessment of how often the browser should be prompted to fetch a fresh copy of the page. .. warning:: Stolen from Philip Cooper's `OpenVest wiki `_ after which it was updated and edited ... Inside the Beaker Cache ----------------------- Caching ^^^^^^^ First lets start out with some **slow** function that we would like to cache. This function is not slow but it will show us when it was cached so we can see things are working as we expect: .. code-block:: python import time def slooow(myarg): # some slow database or template stuff here return "%s at %s" % (myarg,time.asctime()) When we have the cached function, multiple calls will tell us whether are seeing a cached or a new version. DBMCache ^^^^^^^^ The DBMCache stores (actually pickles) the response in a dbm style database. What may not be obvious is that the are two levels of keys. They are essentially created as one for the function or template name (called the namespace) and one for the ''keys'' within that (called the key). So for `Some_Function_name`, there is a cache created as one dbm file/database. As that function is called with different arguments, those arguments are keys within the dbm file. First lets create and populate a cache. This cache might be a cache for the function `Some_Function_name` called three times with three different arguments: `x, yy, and zzz`: .. code-block:: python from beaker.cache import CacheManager cm = CacheManager(type='dbm', data_dir='beaker.cache') cache = cm.get_cache('Some_Function_name') # the cache is setup but the dbm file is not created until needed # so let's populate it with three values: cache.get_value('x', createfunc=lambda: slooow('x'), expiretime=15) cache.get_value('yy', createfunc=lambda: slooow('yy'), expiretime=15) cache.get_value('zzz', createfunc=lambda: slooow('zzz'), expiretime=15) Nothing much new yet. After getting the cache we can use the cache as per the Beaker Documentation. .. code-block:: python import beaker.container as container cc = container.ContainerContext() nsm = cc.get_namespace_manager('Some_Function_name', container.DBMContainer,data_dir='beaker.cache') filename = nsm.file Now we have the file name. The file name is a `sha` hash of a string which is a join of the container class name and the function name (used in the `get_cache` function call). It would return something like: .. code-block:: python 'beaker.cache/container_dbm/a/a7/a768f120e39d0248d3d2f23d15ee0a20be5226de.dbm' With that file name you could look directly inside the cache database (but only for your education and debugging experience, **not** your cache interactions!) .. code-block:: python ## this file name can be used directly (for debug ONLY) import anydbm import pickle db = anydbm.open(filename) old_t, old_v = pickle.loads(db['zzz']) The database only contains the old time and old value. Where did the expire time and the function to create/update the value go?. They never make it to the database. They reside in the `cache` object returned from `get_cache` call above. Note that the createfunc, and expiretime values are stored during the first call to `get_value`. Subsequent calls with (say) a different expiry time will **not** update that value. This is a tricky part of the caching but perhaps is a good thing since different processes may have different policies in effect. If there are difficulties with these values, remember that one call to :func:`cache.clear` resets everything. Database Cache ^^^^^^^^^^^^^^ Using the `ext:database` cache type. .. code-block:: python from beaker.cache import CacheManager #cm = CacheManager(type='dbm', data_dir='beaker.cache') cm = CacheManager(type='ext:database', url="sqlite:///beaker.cache/beaker.sqlite", data_dir='beaker.cache') cache = cm.get_cache('Some_Function_name') # the cache is setup but the dbm file is not created until needed # so let's populate it with three values: cache.get_value('x', createfunc=lambda: slooow('x'), expiretime=15) cache.get_value('yy', createfunc=lambda: slooow('yy'), expiretime=15) cache.get_value('zzz', createfunc=lambda: slooow('zzz'), expiretime=15) This is identical to the cache usage above with the only difference being the creation of the `CacheManager`. It is much easier to view the caches outside the beaker code (again for edification and debugging, not for api usage). SQLite was used in this instance and the SQLite data file can be directly accessed uaing the SQLite command-line utility or the Firefox plug-in: .. code-block:: text sqlite3 beaker.cache/beaker.sqlite # from inside sqlite: sqlite> .schema CREATE TABLE beaker_cache ( id INTEGER NOT NULL, namespace VARCHAR(255) NOT NULL, key VARCHAR(255) NOT NULL, value BLOB NOT NULL, PRIMARY KEY (id), UNIQUE (namespace, key) ); select * from beaker_cache; .. warning:: The data structure is different in Beaker 0.8 ... .. code-block:: python cache = sa.Table(table_name, meta, sa.Column('id', types.Integer, primary_key=True), sa.Column('namespace', types.String(255), nullable=False), sa.Column('accessed', types.DateTime, nullable=False), sa.Column('created', types.DateTime, nullable=False), sa.Column('data', types.BLOB(), nullable=False), sa.UniqueConstraint('namespace') ) It includes the access time but stores rows on a one-row-per-namespace basis, (storing a pickled dict) rather than one-row-per-namespace/key-combination. This is a more efficient approach when the problem is handling a large number of namespaces with limited keys --- like sessions. Memcached Cache ^^^^^^^^^^^^^^^ For large numbers of keys with expensive pre-key lookups memcached it the way to go. If memcached is running on the the default port of 11211: .. code-block:: python from beaker.cache import CacheManager cm = CacheManager(type='ext:memcached', url='127.0.0.1:11211', lock_dir='beaker.cache') cache = cm.get_cache('Some_Function_name') # the cache is setup but the dbm file is not created until needed # so let's populate it with three values: cache.get_value('x', createfunc=lambda: slooow('x'), expiretime=15) cache.get_value('yy', createfunc=lambda: slooow('yy'), expiretime=15) cache.get_value('zzz', createfunc=lambda: slooow('zzz'), expiretime=15)