redis之数据淘汰策略(三)lfu

本文介绍了Redis 4.0.0中引入的LFU淘汰策略,重点讲解了LFU的工作原理、计数器管理、淘汰池设计、淘汰流程和随机策略改进。通过优化,内存使用更高效,关键在于结合访问频率和时间来决定淘汰策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

因为redis的lru的淘汰策略关注的是key的访问时间,如果是一次性的遍历key那种操作,将导致很多不在访问的key滞留在内存中,将立即需要访问的数据淘汰出去。
因此在redis4.0.0中引入了lfu,lfu是在lru基础上进行优化的,lfu在时间的基础上增加了访问次数的判断。

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits decreas time). */
    int refcount;
    void *ptr;
} robj;

请添加图片描述
将原先24bit的lru字段拆分成了两个字段ldt和count, ldt存访问时间,而count则存储访问次数。
对于count只有8bit,所以count的最大值只有255, 所以不是每访问一次就加一。而是通过一个策略进行递增。

1. 淘汰池节点定义

#define EVPOOL_SIZE 16
#define EVPOOL_CACHED_SDS_SIZE 255
struct evictionPoolEntry {
    unsigned long long idle;    /* Object idle time (inverse frequency for LFU) */
    sds key;                    /* Key name. */
    sds cached;                 /* Cached SDS object for key name. */
    int dbid;                   /* Key DB number. */
};

//全局的淘汰池指针
static struct evictionPoolEntry *EvictionPoolLRU;

2. 淘汰池分配空间

/* Create a new eviction pool. */
void evictionPoolAlloc(void) {
    struct evictionPoolEntry *ep;
    int j;

    ep = zmalloc(sizeof(*ep)*EVPOOL_SIZE);
    for (j = 0; j < EVPOOL_SIZE; j++) {
        ep[j].idle = 0;
        ep[j].key = NULL;
        ep[j].cached = sdsnewlen(NULL,EVPOOL_CACHED_SDS_SIZE);
        ep[j].dbid = 0;
    }
    EvictionPoolLRU = ep;
}

请添加图片描述
只有一个全局的淘汰池,一个淘汰池中有所有db中的淘汰候选key,所以增加了dbid字段,标识当前key所属db。
对于key加入淘汰池,每次都要动态的分配空间,容易造成内存碎片,以及性能问题,所以增加了cached字段,提前分配空间,后续直接使用,减少频繁的分配空间,但是提前分配的空间是255字符,所以当key的长度超过255时还是需要动态分配空间。

3. 对象创建

为了让新创建的对象不至于马上就被淘汰,所以count的初始值为5

#define LFU_INIT_VAL 5
robj *createObject(int type, void *ptr) {
    robj *o = zmalloc(sizeof(*o));
    o->type = type;
    o->encoding = OBJ_ENCODING_RAW;
    o->ptr = ptr;
    o->refcount = 1;

    /* Set the LRU to the current lruclock (minutes resolution), or
     * alternatively the LFU counter. */
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();
    }
    return o;
}
unsigned long LFUGetTimeInMinutes(void) {
    return (server.unixtime/60) & 65535;
}

4. 对象访问时,更新计数

因为count只有8bit,最大值只有255,因此不能每访问一次就增加一次计数,所以有了一个增加计数的算法

# 1. A random number R between 0 and 1 is extracted.
# 2. A probability P is calculated as 1/(old_value*lfu_log_factor+1).
# 3. The counter is incremented only if R < P.
/* Logarithmically increment a counter. The greater is the current counter value
 * the less likely is that it gets really implemented. Saturate it at 255. */
uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255;
    double r = (double)rand()/RAND_MAX;
    double baseval = counter - LFU_INIT_VAL;
    if (baseval < 0) baseval = 0;
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    if (r < p) counter++;
    return counter;
}

官方配置文件中的根据factor的不同值的测试结果如下

# +--------+------------+------------+------------+------------+------------+
# | factor | 100 hits   | 1000 hits  | 100K hits  | 1M hits    | 10M hits   |
# +--------+------------+------------+------------+------------+------------+
# | 0      | 104        | 255        | 255        | 255        | 255        |
# +--------+------------+------------+------------+------------+------------+
# | 1      | 18         | 49         | 255        | 255        | 255        |
# +--------+------------+------------+------------+------------+------------+
# | 10     | 10         | 18         | 142        | 255        | 255        |
# +--------+------------+------------+------------+------------+------------+
# | 100    | 8          | 11         | 49         | 143        | 255        |
# +--------+------------+------------+------------+------------+------------+
#

每次访问时,进行计数的更新(在rdb或者aof重写过程中更新
并且LOOKUP_NOTOUCH标志的操作也不更新(比如这些命令type,ttl, pttl, swapdb

robj *lookupKey(redisDb *db, robj *key, int flags) {
    dictEntry *de = dictFind(db->dict,key->ptr);
    if (de) {
        robj *val = dictGetVal(de);

        /* Update the access time for the ageing algorithm.
         * Don't do it if we have a saving child, as this will trigger
         * a copy on write madness. */
        if (server.rdb_child_pid == -1 &&
            server.aof_child_pid == -1 &&
            !(flags & LOOKUP_NOTOUCH))
        {
            if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
                unsigned long ldt = val->lru >> 8;
                unsigned long counter = LFULogIncr(val->lru & 255);
                val->lru = (ldt << 8) | counter;
            } else {
                val->lru = LRU_CLOCK();
            }
        }
        return val;
    } else {
        return NULL;
    }
}

5. 计数衰减

#define LFU_DECR_INTERVAL 1
unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    if (LFUTimeElapsed(ldt) >= server.lfu_decay_time && counter) {
        if (counter > LFU_INIT_VAL*2) {
            counter /= 2;
            if (counter < LFU_INIT_VAL*2) counter = LFU_INIT_VAL*2;
        } else {
            counter--;
        }
        o->lru = (LFUGetTimeInMinutes()<<8) | counter;
    }
    return counter;
}

当server.lfu_decay_time(可通过lfu-decay-time xxx配置,默认1分钟)时间内没有被访问时,计数将会被衰减。

  • 计数值大于10,则减半
  • 小于等于10,则线性递减

比如某个key的count为500,随着时间的递增而衰减过程如下。
请添加图片描述

6. 从所有db中筛选候选key

原先是每个db自己有自己的淘汰池,现在只有一个全局的淘汰池,候选的key都写入这个淘汰池。

int freeMemoryIfNeeded(void) {
...
if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
        server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
    {
        struct evictionPoolEntry *pool = EvictionPoolLRU;

        while(bestkey == NULL) {
            unsigned long total_keys = 0, keys;

            /* We don't want to make local-db choices when expiring keys,
             * so to start populate the eviction pool sampling keys from
             * every DB. */
            for (i = 0; i < server.dbnum; i++) {
                db = server.db+i;
                dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                        db->dict : db->expires;
                if ((keys = dictSize(dict)) != 0) {
                    evictionPoolPopulate(i, dict, db->dict, pool);
                    total_keys += keys;
                }
            }
            if (!total_keys) break; /* No keys to evict. */

            /* Go backward from best to worst element to evict. */
            for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                if (pool[k].key == NULL) continue;
                bestdbid = pool[k].dbid;

                if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
                    de = dictFind(server.db[pool[k].dbid].dict,
                        pool[k].key);
                } else {
                    de = dictFind(server.db[pool[k].dbid].expires,
                        pool[k].key);
                }

                /* Remove the entry from the pool. */
                if (pool[k].key != pool[k].cached)
                    sdsfree(pool[k].key);
                pool[k].key = NULL;
                pool[k].idle = 0;

                /* If the key exists, is our pick. Otherwise it is
                 * a ghost and we need to try the next element. */
                if (de) {
                    bestkey = dictGetKey(de);
                    break;
                } else {
                    /* Ghost... Iterate again. */
                }
            }
        }
    }
...
}

因为都是用idle进行排序淘汰,所以lfu的则用255-count,这样count越小idle越大;
ttl,则使用ULLONG_MAX -ttl值,ttl越小,idle越大,都是满足条件idle越大越先淘汰。

void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    int j, k, count;
    dictEntry *samples[server.maxmemory_samples];

    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
    for (j = 0; j < count; j++) {
        unsigned long long idle;
        sds key;
        robj *o;
        dictEntry *de;

        de = samples[j];
        key = dictGetKey(de);

        /* If the dictionary we are sampling from is not the main
         * dictionary (but the expires one) we need to lookup the key
         * again in the key dictionary to obtain the value object. */
        if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
            if (sampledict != keydict) de = dictFind(keydict, key);
            o = dictGetVal(de);
        }

        /* Calculate the idle time according to the policy. This is called
         * idle just because the code initially handled LRU, but is in fact
         * just a score where an higher score means better candidate. */
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
            idle = estimateObjectIdleTime(o);
        } else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
            /* When we use an LRU policy, we sort the keys by idle time
             * so that we expire keys starting from greater idle time.
             * However when the policy is an LFU one, we have a frequency
             * estimation, and we want to evict keys with lower frequency
             * first. So inside the pool we put objects using the inverted
             * frequency subtracting the actual frequency to the maximum
             * frequency of 255. */
            idle = 255-LFUDecrAndReturn(o);
        } else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
            /* In this case the sooner the expire the better. */
            idle = ULLONG_MAX - (long)dictGetVal(de);
        } else {
            serverPanic("Unknown eviction policy in evictionPoolPopulate()");
        }

        /* Insert the element inside the pool.
         * First, find the first empty bucket or the first populated
         * bucket that has an idle time smaller than our idle time. */
        k = 0;
        while (k < EVPOOL_SIZE &&
               pool[k].key &&
               pool[k].idle < idle) k++;
        if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
            /* Can't insert if the element is < the worst element we have
             * and there are no empty buckets. */
            continue;
        } else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
            /* Inserting into empty position. No setup needed before insert. */
        } else {
            /* Inserting in the middle. Now k points to the first element
             * greater than the element to insert.  */
            if (pool[EVPOOL_SIZE-1].key == NULL) {
                /* Free space on the right? Insert at k shifting
                 * all the elements from k to end to the right. */

                /* Save SDS before overwriting. */
                sds cached = pool[EVPOOL_SIZE-1].cached;
                memmove(pool+k+1,pool+k,
                    sizeof(pool[0])*(EVPOOL_SIZE-k-1));
                pool[k].cached = cached;
            } else {
                /* No free space on right? Insert at k-1 */
                k--;
                /* Shift all elements on the left of k (included) to the
                 * left, so we discard the element with smaller idle time. */
                sds cached = pool[0].cached; /* Save SDS before overwriting. */
                if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
                memmove(pool,pool+1,sizeof(pool[0])*k);
                pool[k].cached = cached;
            }
        }

        /* Try to reuse the cached SDS string allocated in the pool entry,
         * because allocating and deallocating this object is costly
         * (according to the profiler, not my fantasy. Remember:
         * premature optimizbla bla bla bla. */
        int klen = sdslen(key);
        if (klen > EVPOOL_CACHED_SDS_SIZE) {
            pool[k].key = sdsdup(key);
        } else {
            memcpy(pool[k].cached,key,klen+1);
            sdssetlen(pool[k].cached,klen);
            pool[k].key = pool[k].cached;
        }
        pool[k].idle = idle;
        pool[k].dbid = dbid;
    }
}

7. 从淘汰池中选取淘汰key

...
  /* Go backward from best to worst element to evict. */
for (k = EVPOOL_SIZE-1; k >= 0; k--) {
      if (pool[k].key == NULL) continue;
      bestdbid = pool[k].dbid;

      if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
          de = dictFind(server.db[pool[k].dbid].dict,
              pool[k].key);
      } else {
          de = dictFind(server.db[pool[k].dbid].expires,
              pool[k].key);
      }

      /* Remove the entry from the pool. */
      if (pool[k].key != pool[k].cached)
          sdsfree(pool[k].key);
      pool[k].key = NULL;
      pool[k].idle = 0;

      /* If the key exists, is our pick. Otherwise it is
       * a ghost and we need to try the next element. */
      if (de) {
          bestkey = dictGetKey(de);
          break;
      } else {
          /* Ghost... Iterate again. */
      }
  }
  ...

8. 淘汰key

引入了异步删除(lazyfree-lazy-eviction no,默认关闭的),并且在异步删除中判断了删除元素的个数,只有大于64时才会进行异步后台任务删除,否则也是同步删除。
在开启了异步删除时,每当释放16个key时就检查一次内存释放已经低于阈值。

...
/* Finally remove the selected key. */
if (bestkey) {
     db = server.db+bestdbid;
     robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
     propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
     /* We compute the amount of memory freed by db*Delete() alone.
      * It is possible that actually the memory needed to propagate
      * the DEL in AOF and replication link is greater than the one
      * we are freeing removing the key, but we can't account for
      * that otherwise we would never exit the loop.
      *
      * AOF and Output buffer memory will be freed eventually so
      * we only care about memory used by the key space. */
     delta = (long long) zmalloc_used_memory();
     latencyStartMonitor(eviction_latency);
     if (server.lazyfree_lazy_eviction)
         dbAsyncDelete(db,keyobj);
     else
         dbSyncDelete(db,keyobj);
     latencyEndMonitor(eviction_latency);
     latencyAddSampleIfNeeded("eviction-del",eviction_latency);
     latencyRemoveNestedEvent(latency,eviction_latency);
     delta -= (long long) zmalloc_used_memory();
     mem_freed += delta;
     server.stat_evictedkeys++;
     notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
         keyobj, db->id);
     decrRefCount(keyobj);
     keys_freed++;

     /* When the memory to free starts to be big enough, we may
      * start spending so much time here that is impossible to
      * deliver data to the slaves fast enough, so we force the
      * transmission here inside the loop. */
     if (slaves) flushSlavesOutputBuffers();

     /* Normally our stop condition is the ability to release
      * a fixed, pre-computed amount of memory. However when we
      * are deleting objects in another thread, it's better to
      * check, from time to time, if we already reached our target
      * memory, since the "mem_freed" amount is computed only
      * across the dbAsyncDelete() call, while the thread can
      * release the memory all the time. */
     if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
         overhead = freeMemoryGetNotCountedMemory();
         mem_used = zmalloc_used_memory();
         mem_used = (mem_used > overhead) ? mem_used-overhead : 0;
         if (mem_used <= server.maxmemory) {
             mem_freed = mem_tofree;
         }
     }
 }
...

9. 对于随机策略的优化

原先每次进入内存检测进行淘汰时,都是从db0,db1,db2…dbN, 这样前面的db将删除的key最多,这样db的数据可能平衡。
新的随机策略中,引入了一个静态变量next_db,这样将会每次都从上一次结束的db开始进行淘汰,一轮结束后才又从开头进行扫描进行淘汰。

/* volatile-random and allkeys-random policy */
...
static int next_db = 0;
...
 else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
   {
       /* When evicting a random key, we try to evict a key for
        * each DB, so we use the static 'next_db' variable to
        * incrementally visit all DBs. */
       for (i = 0; i < server.dbnum; i++) {
           j = (++next_db) % server.dbnum;
           db = server.db+j;
           dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
                   db->dict : db->expires;
           if (dictSize(dict) != 0) {
               de = dictGetRandomKey(dict);
               bestkey = dictGetKey(de);
               bestdbid = j;
               break;
           }
       }
   }

10. 对于淘汰池的优化

  • 将evictionPoolEntry 结构从db中剔除,变成一个全局的指针
    原先evictionPoolEntry在每个db中都有一个,默认16个db,大部分都是没用使用的,这样就浪费空间
  • 原先是每个db分别进行淘汰, 而现在将所有的db抽样m个,即从m*db_num个元素中淘汰一个
    原先只是每个db中抽样m个进行淘汰,现在每次从所有的db中抽样m个进行淘汰,这样更能淘汰最久未访问的

11. 其他的改变

  • 原来定义的简单的枚举值,现在定义为位操作值
/* Redis maxmemory strategies. Instead of using just incremental number
 * for this defines, we use a set of flags so that testing for certain
 * properties common to multiple policies is faster. */
#define MAXMEMORY_FLAG_LRU (1<<0)
#define MAXMEMORY_FLAG_LFU (1<<1)
#define MAXMEMORY_FLAG_ALLKEYS (1<<2)
#define MAXMEMORY_FLAG_NO_SHARED_INTEGERS \
    (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU)

#define MAXMEMORY_VOLATILE_LRU ((0<<8)|MAXMEMORY_FLAG_LRU)
#define MAXMEMORY_VOLATILE_LFU ((1<<8)|MAXMEMORY_FLAG_LFU)
#define MAXMEMORY_VOLATILE_TTL (2<<8)
#define MAXMEMORY_VOLATILE_RANDOM (3<<8)
#define MAXMEMORY_ALLKEYS_LRU ((4<<8)|MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_ALLKEYS_LFU ((5<<8)|MAXMEMORY_FLAG_LFU|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_ALLKEYS_RANDOM ((6<<8)|MAXMEMORY_FLAG_ALLKEYS)
#define MAXMEMORY_NO_EVICTION (7<<8)

#define CONFIG_DEFAULT_MAXMEMORY_POLICY MAXMEMORY_NO_EVICTION
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值