Open In App

NLP | Storing Frequency Distribution in Redis

Last Updated : 12 Jun, 2019
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report
The nltk.probability.FreqDist class is used in many classes throughout NLTK for storing and managing frequency distributions. It's quite useful, but it's all in-memory, and doesn't provide a way to persist the data. A single FreqDist is also not accessible to multiple processes. All that can be changed by building a FreqDist on top of Redis. What is Redis?
  • Redis is a data structure server that is one of the more popular NoSQL databases.
  • Among other things, it provides a network-accessible database for storing dictionaries (also known as hash maps).
  • Building a FreqDist interface to a Redis hash map will allow us to create a persistent FreqDist that is accessible to multiple local and remote processes at the same time.
Installation : How it works?
  • The FreqDist class extends the standard library collections.Counter class, which makes a FreqDist a small wrapper with a few extra methods, such as N().
  • The N() method returns the number of sample outcomes, which is the sum of all the values in the frequency distribution.
  • An API-compatible class is created on top of Redis by extending a RedisHashMapand then implementing the N() method.
  • The RedisHashFreqDist (defined in redisprob.py) sums all the values in the hash map for the N() method
Code : Explaining the working Python3 1==
from rediscollections import RedisHashMap

class RedisHashFreqDist(RedisHashMap):
    def N(self):
        return int(sum(self.values()))
    
    def __missing__(self, key):
        return 0
    
    def __getitem__(self, key):
        return int(RedisHashMap.__getitem__(self, key) or 0)
    
    def values(self):
        return [int(v) for v in RedisHashMap.values(self)]
    
    def items(self):
        return [(k, int(v)) for (k, v) in RedisHashMap.items(self)]
This class can be used just like a FreqDist. To instantiate it, pass a Redis connection and the name of our hash map. The name should be a unique reference to this particular FreqDist so that it doesn't clash with any other keys in Redis. Code: Python3 1==
from redis import Redis
from redisprob import RedisHashFreqDist

r = Redis()
rhfd = RedisHashFreqDist(r, 'test')
print (len(rhfd))

rhfd['foo'] += 1
print (rhfd['foo'])

rhfd.items()
print (len(rhfd))
Output :
0
1
1
Most of the work is done in the RedisHashMap class, which extends collections.MutableMapping and then overrides all methods that require Redis-specific commands. Outline of each method that uses a specific Redis command:
  • __len__() : This uses the hlen command to get the number of elements in thehash map
  • __contains__(): This uses the hexists command to check if an element existsin the hash map
  • __getitem__(): This uses the hget command to get a value from the hash map
  • __setitem__(): This uses the hset command to set a value in the hash map
  • __delitem__(): This uses the hdel command to remove a value from thehash map
  • keys(): This uses the hkeys command to get all the keys in the hash map
  • values(): This uses the hvals command to get all the values in the hash map
  • items(): This uses the hgetall command to get a dictionary containing all the keys and values in the hash map
  • clear(): This uses the delete command to remove the entire hash map from Redis

Next Article
Practice Tags :

Similar Reads