Redis module RedisBloom

Install

布隆过滤器应该不陌生了, 之前使用的版本是 Gem 包封装 BitMap 来实现的, 效率上存在一些问题.

Redis 从4.0开始支持用模块来扩展Redis的功能, 今天来试试 Redis 模块中对 Bloom 的支持: RedisBloom .

免安装的方法是使用现成的 Docker 镜像:

docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest
docker exec -it redis-redisbloom bash
# redis-cli
# 127.0.0.1:6379> 

由于编译 Redis 实在是很方便, 这里选择用源码编译:

1) 下载&编译 redis 本身

git clone --branch 5.0 --depth 1 https://github.com/antirez/redis.git
cd redis
make

2) 下载&编译 RedisBloom 模块

git clone --depth 1 https://github.com/RedisLabsModules/redisbloom.git
cd redisbloom
make

这时会得到 ./redisbloom/rebloom.so, 启动 redis-server 的时候加载该 .so 文件:

cd ..
./src/redis-server --port 7777 --loadmodule ./redisbloom/rebloom.so

./src/redis-cli -p 7777

Bloom Filter

该实现的 Bloom 算法默认错误率为 0.01 , 初始容量为 100 .

启动 redis-server 的时候, 可以传入参数来指定这两个值:

./src/redis-server --port 7777 --loadmodule ./redisbloom/rebloom.so INITIAL_SIZE 400 ERROR_RATE 0.004

正因为有如上的默认参数, 我们可以开箱即用地使用 BF.ADD 命令来对一个key添加元素标记.

但是考虑到具体的使用场景需要的容错率和元素容量不同, 我们最好用 BF.RESERVE 来指定一个Bloom的数据结构.

初始化一个容错率为 0.1% , 设计容量为 1000000 的Bloom:

BF.RESERVE sample_key 0.001 1000000

占用情况如下:

127.0.0.1:7777> DEBUG OBJECT sample_key
Value at:0x7f887a500650 refcount:1 encoding:raw serializedlength:23898 lru:10598970 lru_seconds_idle:426

serializedlength 的单位是 byte .

Bloom 并不像String或者Hash那样可以自动扩展, 如果实际的元素数量超出计划值, 就要拿原始数据重建Bloom.

具体参数选择可以参考这个布隆过滤器计算器: https://krisives.github.io/bloom-calculator/ .

添加元素:

127.0.0.1:7777> BF.ADD sample_key k1
(integer) 1
127.0.0.1:7777> BF.ADD sample_key k2
(integer) 1
127.0.0.1:7777> BF.ADD sample_key k3
(integer) 1
127.0.0.1:7777> BF.ADD sample_key k3
(integer) 0

一次添加多个:

127.0.0.1:7777> BF.MADD sample_key k4 k5 k6
1) (integer) 1
2) (integer) 1
3) (integer) 1

查询是否存在:

127.0.0.1:7777> BF.EXISTS sample_key k3
(integer) 1
127.0.0.1:7777> BF.EXISTS sample_key kx
(integer) 0

一次性查询多个是否存在:

127.0.0.1:7777> BF.MEXISTS sample_key k1 k3 kx
1) (integer) 1
2) (integer) 1
3) (integer) 0

插入元素, 结合RESERVE和ADD功能:

添加一个或多个:

BF.INSERT sample_key ITEMS k1 k100
1) (integer) 0
2) (integer) 1

如果 key 不存在, 就使用参数中的 CAPACITY 和 ERROR 初始化一个新的 Bloom:

127.0.0.1:7777> BF.INSERT new_sample_key CAPACITY 1000 ITEMS k1 k100
1) (integer) 1
2) (integer) 1
127.0.0.1:7777> del new_sample_key
(integer) 1
127.0.0.1:7777> BF.INSERT new_sample_key CAPACITY 1000 ERROR 0.01 ITEMS k1 k100
1) (integer) 1
2) (integer) 1

只允许使用已存在的key, 不许创建新的key:

127.0.0.1:7777> BF.INSERT new_sample_key CAPACITY 1000 ERROR 0.01 NOCREATE ITEMS k1 k100
1) (integer) 0
2) (integer) 0
127.0.0.1:7777> BF.INSERT new_new_sample_key CAPACITY 1000 ERROR 0.01 NOCREATE ITEMS k1 k100
(error) ERR not found
127.0.0.1:7777> BF.INSERT new_new_sample_key CAPACITY 1000 NOCREATE ITEMS k1 k100
(error) ERR not found
127.0.0.1:7777> BF.INSERT new_new_sample_key NOCREATE ITEMS k1 k100
(error) ERR not found

还有两个处理大Bloom的命令: BF.SCANDUMP 和 BF.LOADCHUNK .