redis之数据淘汰策略（三）lfu

最新推荐文章于 2025-06-27 17:03:43 发布

原创最新推荐文章于 2025-06-27 17:03:43 发布 · 377 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#redis

redis 专栏收录该内容

42 篇文章

订阅专栏

本文介绍了Redis 4.0.0中引入的LFU淘汰策略，重点讲解了LFU的工作原理、计数器管理、淘汰池设计、淘汰流程和随机策略改进。通过优化，内存使用更高效，关键在于结合访问频率和时间来决定淘汰策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

因为redis的lru的淘汰策略关注的是key的访问时间，如果是一次性的遍历key那种操作，将导致很多不在访问的key滞留在内存中，将立即需要访问的数据淘汰出去。
因此在redis4.0.0中引入了lfu，lfu是在lru基础上进行优化的，lfu在时间的基础上增加了访问次数的判断。

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits decreas time). */
    int refcount;
    void *ptr;
} robj;

请添加图片描述
将原先24bit的lru字段拆分成了两个字段ldt和count, ldt存访问时间，而count则存储访问次数。
对于count只有8bit，所以count的最大值只有255，所以不是每访问一次就加一。而是通过一个策略进行递增。

1. 淘汰池节点定义

#define EVPOOL_SIZE 16
#define EVPOOL_CACHED_SDS_SIZE 255
struct evictionPoolEntry {
    unsigned long long idle;    /* Object idle time (inverse frequency for LFU) */
    sds key;                    /* Key name. */
    sds cached;                 /* Cached SDS object for key name. */
    int dbid;                   /* Key DB number. */
};

//全局的淘汰池指针
static struct evictionPoolEntry *EvictionPoolLRU;

2. 淘汰池分配空间

/* Create a new eviction pool. */
void evictionPoolAlloc(void) {
    struct evictionPoolEntry *ep;
    int j;

    ep = zmalloc(sizeof(*ep)*EVPOOL_SIZE);
    for (j = 0; j < EVPOOL_SIZE; j++) {
        ep[j].idle = 0;
        ep[j].key = NULL;
        ep[j].cached = sdsnewlen(NULL,EVPOOL_CACHED_SDS_SIZE);
        ep[j].dbid = 0;
    }
    EvictionPoolLRU = ep;
}

请添加图片描述
只有一个全局的淘汰池，一个淘汰池中有所有db中的淘汰候选key，所以增加了dbid字段，标识当前key所属db。
对于key加入淘汰池，每次都要动态的分配空间，容易造成内存碎片，以及性能问题，所以增加了cached字段，提前分配空间，后续直接使用，减少频繁的分配空间，但是提前分配的空间是255字符，所以当key的长度超过255时还是需要动态分配空间。

3. 对象创建

为了让新创建的对象不至于马上就被淘汰，所以count的初始值为5

#define LFU_INIT_VAL 5

robj *createObject(int type, void *ptr) {
    robj *o = zmalloc(sizeof(*o));
    o->type = type;
    o->encoding = OBJ_ENCODING_RAW;
    o->ptr = ptr;
    o->refcount = 1;

    /* Set the LRU to the current lruclock (minutes resolution), or
     * alternatively the LFU counter. */
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();
    }
    return o;
}

unsigned long LFUGetTimeInMinutes(void) {
    return (server.unixtime/60) & 65535;
}

4. 对象访问时，更新计数

因为count只有8bit，最大值只有255，因此不能每访问一次就增加一次计数，所以有了一个增加计数的算法

# 1. A random number R between 0 and 1 is extracted.
# 2. A probability P is calculated as 1/(old_value*lfu_log_factor+1).
# 3. The counter is incremented only if R < P.

/* Logarithmically increment a counter. The greater is the current counter value
 * the less likely is that it gets really implemented. Saturate it at 255. */
uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255;
    double r = (double)rand()/RAND_MAX;
    double baseval = counter - LFU_INIT_VAL;
    if (baseval < 0) baseval = 0;
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    if (r < p) counter++;
    return counter;
}

官方配置文件中的根据factor的不同值的测试结果如下

# +--------+------------+------------+------------+------------+------------+
# | factor | 100 hits   | 1000 hits  | 100K hits  | 1M hits    | 10M hits   |
# +--------+------------+------------+------------+------------+------------+
# | 0      | 104        | 255        | 255        | 255        | 255        |
# +--------+------------+------------+------------+------------+------------+
# | 1      | 18         | 49         | 255        | 255        | 255        |
# +--------+------------+------------+------------+------------+------------+
# | 10     | 10         | 18         | 142        | 255        | 255        |
# +--------+------------+------------+------------+------------+------------+
# | 100    | 8          | 11         | 49         | 143        | 255        |
# +--------+------------+------------+------------+------------+------------+
#

每次访问时，进行计数的更新（在rdb或者aof重写过程中更新）
并且LOOKUP_NOTOUCH标志的操作也不更新（比如这些命令type，ttl, pttl, swapdb）

robj *lookupKey(redisDb *db, robj *key, int flags) {
    dictEntry *de = dictFind(db->dict,key->ptr);
    if (de) {
        robj *val = dictGetVal(de);

        /* Update the access time for the ageing algorithm.
         * Don't do it if we have a saving child, as this will trigger
         * a copy on write madness. */
        if (server.rdb_child_pid == -1 &&
            server.aof_child_pid == -1 &&
            !(flags & LOOKUP_NOTOUCH))
        {
            if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
                unsigned long ldt = val->lru >> 8;
                unsigned long counter = LFULogIncr(val->lru & 255);
                val->lru = (ldt << 8) | counter;
            } else {
                val->lru = LRU_CLOCK();
            }
        }
        return val;
    } else {
        return NULL;
    }
}

5. 计数衰减

#define LFU_DECR_INTERVAL 1
unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    if (LFUTimeElapsed(ldt) >= server.lfu_decay_time && counter) {
        if (counter > LFU_INIT_VAL*2) {
            counter /= 2;
            if (counter < LFU_INIT_VAL*2) counter = LFU_INIT_VAL*2;
        } else {
            counter--;
        }
        o->lru = (LFUGetTimeInMinutes()<<8) | counter;
    }
    return counter;
}

当server.lfu_decay_time（可通过lfu-decay-time xxx配置，默认1分钟）时间内没有被访问时，计数将会被衰减。

计数值大于10，则减半
小于等于10，则线性递减

比如某个key的count为500，随着时间的递增而衰减过程如下。
请添加图片描述

6. 从所有db中筛选候选key

原先是每个db自己有自己的淘汰池，现在只有一个全局的淘汰池，候选的key都写入这个淘汰池。

int freeMemoryIfNeeded(void) {
...
if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
        server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
    {
        struct evictionPoolEntry *pool = EvictionPoolLRU;

        while(bestkey == NULL) {
            unsigned long total_keys = 0, keys;

            /* We don't want to make local-db choices when expiring keys,
             * so to start populate the eviction pool sampling keys from
             * every DB. */
            for (i = 0; i < server.dbnum; i++) {
                db = server.db+i;
                dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                        db->dict : db->expires;
                if ((keys = dictSize(dict)) != 0) {
                    evictionPoolPopulate(i, dict, db->dict, pool);
                    total_keys += keys;
                }
            }
            if (!total_keys) break; /* No keys to evict. */

            /* Go backward from best to worst element to evict. */
            for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                if (pool[k].key == NULL) continue;
                bestdbid = pool[k].dbid;

                if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
                    de = dictFind(server.db[pool[k].dbid].dict,
                        pool[k].key);
                } else {
                    de = dictFind(server.db[pool[k].dbid].expires,
                        pool[k].key);
                }

                /* Remove the entry from the pool. */
                if (pool[k].key != pool[k].cached)
                    sdsfree(pool[k].key);
                pool[k].key = NULL;
                pool[k].idle = 0;

                /* If the key exists, is our pick. Otherwise it is
                 * a ghost and we need to try the next element. */
                if (de) {
                    bestkey = dictGetKey(de);
                    break;
                } else {
                    /* Ghost... Iterate again. */
                }
            }
        }
    }
...
}

因为都是用idle进行排序淘汰，所以lfu的则用255-count,这样count越小idle越大；
ttl，则使用ULLONG_MAX -ttl值，ttl越小，idle越大，都是满足条件idle越大越先淘汰。

void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    int j, k, count;
    dictEntry *samples[server.maxmemory_samples];

    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
    for (j = 0; j < count; j++) {
        unsigned long long idle;
        sds key;
        robj *o;
        dictEntry *de;

        de = samples[j];
        key = dictGetKey(de);

        /* If the dictionary we are sampling from is not the main
         * dictionary (but the expires one) we need to lookup the key
         * again in the key dictionary to obtain the value object. */
        if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
            if (sampledict != keydict) de = dictFind(keydict, key);
            o = dictGetVal(de);
        }

        /* Calculate the idle time according to the policy. This is called
         * idle just because the code initially handled LRU, but is in fact
         * just a score where an higher score means better candidate. */
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
            idle = estimateObjectIdleTime(o);
        } else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
            /* When we use an LRU policy, we sort the keys by idle time
             * so that we expire keys starting from greater idle time.
             * However when the policy is an LFU one, we have a frequency
             * estimation, and we want to evict keys with lower frequency
             * first. So inside the pool we put objects using the inverted
             * frequency subtracting the actual frequency to the maximum
             * frequency of 255. */
            idle = 255-LFUDecrAndReturn(o);
        } else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
            /* In this case the sooner the expire the better. */
            idle = ULLONG_MAX - (long)dictGetVal(de);
        } else {
            serverPanic("Unknown eviction policy in evictionPoolPopulate()");
        }

        /* Insert the element inside the pool.
         * First, find the first empty bucket or the first populated
         * bucket that has an idle time smaller than our idle time. */
        k = 0;
        while (k < EVPOOL_SIZE &&
               pool[k].key &&
               pool[k].idle < idle) k++;
        if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
            /* Can't insert if the element is < the worst element we have
             * and there are no empty buckets. */
            continue;
        } else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
            /* Inserting into empty position. No setup needed before insert. */
        } else {
            /* Inserting in the middle. Now k points to the first element
             * greater than the element to insert.  */
            if (pool[EVPOOL_SIZE-1].key == NULL) {
                /* Free space on the right? Insert at k shifting
                 * all the elements from k to end to the right. */

                /* Save SDS before overwriting. */
                sds cached = pool[EVPOOL_SIZE-1].cached;
                memmove(pool+k+1,pool+k,
                    sizeof(pool[0])*(EVPOOL_SIZE-k-1));
                pool[k].cached = cached;
            } else {
                /* No free space on right? Insert at k-1 */
                k--;
                /* Shift all elements on the left of k (included) to the
                 * left, so we discard the element with smaller idle time. */
                sds cached = pool[0].cached; /* Save SDS before overwriting. */
                if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
                memmove(pool,pool+1,sizeof(pool[0])*k);
                pool[k].cached = cached;
            }
        }

        /* Try to reuse the cached SDS string allocated in the pool entry,
         * because allocating and deallocating this object is costly
         * (according to the profiler, not my fantasy. Remember:
         * premature optimizbla bla bla bla. */
        int klen = sdslen(key);
        if (klen > EVPOOL_CACHED_SDS_SIZE) {
            pool[k].key = sdsdup(key);
        } else {
            memcpy(pool[k].cached,key,klen+1);
            sdssetlen(pool[k].cached,klen);
            pool[k].key = pool[k].cached;
        }
        pool[k].idle = idle;
        pool[k].dbid = dbid;
    }
}

7. 从淘汰池中选取淘汰key

...
  /* Go backward from best to worst element to evict. */
for (k = EVPOOL_SIZE-1; k >= 0; k--) {
      if (pool[k].key == NULL) continue;
      bestdbid = pool[k].dbid;

      if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
          de = dictFind(server.db[pool[k].dbid].dict,
              pool[k].key);
      } else {
          de = dictFind(server.db[pool[k].dbid].expires,
              pool[k].key);
      }

      /* Remove the entry from the pool. */
      if (pool[k].key != pool[k].cached)
          sdsfree(pool[k].key);
      pool[k].key = NULL;
      pool[k].idle = 0;

      /* If the key exists, is our pick. Otherwise it is
       * a ghost and we need to try the next element. */
      if (de) {
          bestkey = dictGetKey(de);
          break;
      } else {
          /* Ghost... Iterate again. */
      }
  }
  ...

8. 淘汰key

引入了异步删除（lazyfree-lazy-eviction no,默认关闭的），并且在异步删除中判断了删除元素的个数，只有大于64时才会进行异步后台任务删除，否则也是同步删除。
在开启了异步删除时，每当释放16个key时就检查一次内存释放已经低于阈值。

...
/* Finally remove the selected key. */
if (bestkey) {
     db = server.db+bestdbid;
     robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
     propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
     /* We compute the amount of memory freed by db*Delete() alone.
      * It is possible that actually the memory needed to propagate
      * the DEL in AOF and replication link is greater than the one
      * we are freeing removing the key, but we can't account for
      * that otherwise we would never exit the loop.
      *
      * AOF and Output buffer memory will be freed eventually so
      * we only care about memory used by the key space. */
     delta = (long long) zmalloc_used_memory();
     latencyStartMonitor(eviction_latency);
     if (server.lazyfree_lazy_eviction)
         dbAsyncDelete(db,keyobj);
     else
         dbSyncDelete(db,keyobj);
     latencyEndMonitor(eviction_latency);
     latencyAddSampleIfNeeded("eviction-del",eviction_latency);
     latencyRemoveNestedEvent(latency,eviction_latency);
     delta -= (long long) zmalloc_used_memory();
     mem_freed += delta;
     server.stat_evictedkeys++;
     notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
         keyobj, db->id);
     decrRefCount(keyobj);
     keys_freed++;

     /* When the memory to free starts to be big enough, we may
      * start spending so much time here that is impossible to
      * deliver data to the slaves fast enough, so we force the
      * transmission here inside the loop. */
     if (slaves) flushSlavesOutputBuffers();

     /* Normally our stop condition is the ability to release
      * a fixed, pre-computed amount of memory. However when we
      * are deleting objects in another thread, it's better to
      * check, from time to time, if we already reached our target
      * memory, since the "mem_freed" amount is computed only
      * across the dbAsyncDelete() call, while the thread can
      * release the memory all the time. */
     if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
         overhead = freeMemoryGetNotCountedMemory();
         mem_used = zmalloc_used_memory();
         mem_used = (mem_used > overhead) ? mem_used-overhead : 0;
         if (mem_used <= server.maxmemory) {
             mem_freed = mem_tofree;
         }
     }
 }
...

9. 对于随机策略的优化

原先每次进入内存检测进行淘汰时，都是从db0,db1,db2…dbN, 这样前面的db将删除的key最多，这样db的数据可能平衡。
新的随机策略中，引入了一个静态变量next_db,这样将会每次都从上一次结束的db开始进行淘汰，一轮结束后才又从开头进行扫描进行淘汰。

/* volatile-random and allkeys-random policy */
...
static int next_db = 0;
...
 else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
   {
       /* When evicting a random key, we try to evict a key for
        * each DB, so we use the static 'next_db' variable to
        * incrementally visit all DBs. */
       for (i = 0; i < server.dbnum; i++) {
           j = (++next_db) % server.dbnum;
           db = server.db+j;
           dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
                   db->dict : db->expires;
           if (dictSize(dict) != 0) {
               de = dictGetRandomKey(dict);
               bestkey = dictGetKey(de);
               bestdbid = j;
               break;
           }
       }
   }

10. 对于淘汰池的优化

将evictionPoolEntry 结构从db中剔除，变成一个全局的指针
原先evictionPoolEntry在每个db中都有一个，默认16个db，大部分都是没用使用的，这样就浪费空间
原先是每个db分别进行淘汰，而现在将所有的db抽样m个，即从m*db_num个元素中淘汰一个
原先只是每个db中抽样m个进行淘汰，现在每次从所有的db中抽样m个进行淘汰，这样更能淘汰最久未访问的

11. 其他的改变

原来定义的简单的枚举值，现在定义为位操作值

/* Redis maxmemory strategies. Instead of using just incremental number
 * for this defines, we use a set of flags so that testing for certain
 * properties common to multiple policies is faster. */
#define MAXMEMORY_FLAG_LRU (1<<0)
#define MAXMEMORY_FLAG_LFU (1<<1)
#define MAXMEMORY_FLAG_ALLKEYS (1<<2)
#define MAXMEMORY_FLAG_NO_SHARED_INTEGERS \
    (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU)

#define MAXMEMORY_VOLATILE_LRU ((0<<8)|MAXMEMORY_FLAG_LRU)
#define MAXMEMORY_VOLATILE_LFU ((1<<8)|MAXMEMORY_FLAG_LFU)
#define MAXMEMORY_VOLATILE_TTL (2<<8)
#define MAXMEMORY_VOLATILE_RANDOM (3<<8)
#define MAXMEMORY_ALLKEYS_LRU ((4<<8)|MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_ALLKEYS_LFU ((5<<8)|MAXMEMORY_FLAG_LFU|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_ALLKEYS_RANDOM ((6<<8)|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_NO_EVICTION (7<<8)

#define CONFIG_DEFAULT_MAXMEMORY_POLICY MAXMEMORY_NO_EVICTION