2010/Stacks of Cache

From Open Source Bridge Wiki
Jump to: navigation, search

This talk focuses on adapting and augmenting interfaces to memcache in order to overcome some of its limitations and to better utilize available resources. Then we’ll talk about combining those interfaces in a simple, snap-together fashion.

Speaker: Duncan Beevers

Return to this session's details

Contributed notes


Memcached: Constant time operations; asynchronous design.

Operations: set, cas, incr, decr, delete

Nice interfaces to MySQL and Postrgres

Can store collections and access items individually.

Consistent hashing allows distributing over nodes while only affecting a small number of keys when a node is added or removed.  That is,  nodes form a ring and changing number of nodes does not change relative positions of other nodes.  This is a client feature.

Standard serialization formats are available, but a custom format can get you better performance.

Multi-get - fetch multiple values across multiple nodes with two TCP calls.

Get-by-master-key - group values under a single master key so that they are stored on the same node.  But most of the time you should rely on the hashing algorithm.

Optimistic locking - uses compare-and-swap (cas) operation with a revision number to avoid overwriting data.

Compressions options vary by client.  Some compress every value, some only large values.  Uses flags field to indicate value is compressed.  A custom client can give you more granular control.

Use 'stats' command to measure eviction rates to see if your cache is full.  You can check age of oldest value too.

Generational keys can make cache invalidation easier.  But they can make optimizing eviction more complicated.  See virtual pools below.

Use short-lived value to enforce rate-limiting. Set expiration to 1s or so and check whether it is still there on next request from the same user.  Memcached is also useful for storing captcha information since that is short-lived.

Namespace invalidation / generational keys - when something changes that should invalidate a lot of keys use a new namespace prefix for keys going forward.  But watch out for prev generation taking up space while direct keys are evicted.  Better performance than invalidating keys individually.

Space allocated for large values will not be re-partitioned for smaller values until the server is restarted - so watch out for memory leaks from that.  To get around this your client can split large values into multiple keys.  Use get-by-master-key with this to keep those keys grouped together.

You can add and additional TTL value in a payload read by your app to avoid dog piling problems.  The outer TTL needs to be longer than the inner TTL.  If you rely on native memcached TTL instead you can get a lot of cache misses at once, which could lead to workers that are supposed to regenerate that key to perform duplicate work.  This nested TTLs you can have successive requests for the same value return a "working on it" message from the app.

Granular replication - You can store keys on the appropriate node and also on the next node in the hash ring.  Results in more writes but allows you to remove a node safely.

Virtual pools - partition clusters, separate keys with frequent evictions on separate clusters.

Memcached flags can represent special handling for values.  They are stored as a 32bit bit vector. In your client use bit shift operations when setting or reading flags so that each data adapter/filter does not have to be assigned a specific index in the bit vector.  As long as the flag passes through the same stack of adapters in order you are good.

Compose adapters similarly to rack middleware components.  Adapters could do stuff like serializing, conditionally compressing, etc.  Stick to one feature per adapter for best modularity.

Rails wants you to use a single, global level cache.  What may be better is to set up separate caches for different responsibilities.

Your views and controllers should know about each other's caches to prevent duplicate DB round trips.