What is it?
This is a way to add tags functionality to classes derived from ndb.Model on
Google App Engine by way of a mixin.
Why is it important?
Adding tags to data models is a popular phenomenon, but there does not appear to be a published way to accomplish the common functionality with ndb. This method ought to be reusable and performant.
How does it work?
Get the full gist is gist.github.com.
This concept has two ndb.Model derivates, the Tag and the TagMixin.
Design
The requirements here for me were as follows:
- The tags be stored with the model (i.e. can be accessed without “joins”)
- The
Taginstances be updated automatically when a model is updated - The updates apply only on changed
Taginstances - The update occur in parallel
I chose to store tags on the mixin as a repeated ndb.StringProperty, with a separate Tag model whose key can be generated from the tag (i.e. the tag blue becomes ndb.Key('Tag', 'tag__blue')). The Tag instances are updated in parallel by tasklets called from ndb model hooks, as seem below.
The Tag
This Tag class that keeps track of the popularity (count) of a
tag and linked items.
from google.appengine.ext import ndb
MAX_TAGS_FOR_TAGGABLE = 1000
POPULAR_PAGE_SIZE = 30
class Tag(ndb.Model):
"""Keep track of data related to a tag added with the TagMixin class.
"""
tag = ndb.StringProperty(required=True, indexed=True)
count = ndb.IntegerProperty(default=0, indexed=True)
linked = ndb.KeyProperty(repeated=True)
created = ndb.DateTimeProperty(auto_now_add=True)
modified = ndb.DateTimeProperty(auto_now=True)
@staticmethod
def tag_to_keyname(tag):
return "tag__{}".format(tag.lower())
@staticmethod
def tag_to_key(tag):
return ndb.Key("Tag", Tag.tag_to_keyname(tag))
@classmethod
def get_linked_by_tag(self, tag, limit=MAX_TAGS_FOR_TAGGABLE):
"""Return the set of keys for the given tag"""
try:
return Tag.tag_to_key(tag).get().linked
except AttributeError:
return []
@classmethod
def get_or_create_async(cls, tag):
"""Return a future for a Tag instance for the given tag
"""
return Tag.get_or_insert_async(Tag.tag_to_keyname(tag), tag=tag)
@classmethod
def get_popular_query(cls, page_size=POPULAR_PAGE_SIZE):
return Tag.query().order(-Tag.count)
def unlink_async(self, key):
self.linked.remove(key)
self.count -= 1
return self.put_async()
def link_async(self, key):
self.linked.append(key)
self.count += 1
return self.put_async()The TagMixin
The mixin that one would use to add tags to models is as follows:
class TagMixin(object):
"""A mixin that adds taggability to a class.
Adds a 'tags' property.
"""
tags = ndb.StringProperty(repeated=True, indexed=True,
validator=lambda p, v: v.lower())
def _post_get_hook(self, future):
"""Set the _tm_tags so we can compare for changes in pre_put
"""
self._tm_tags = future.get_result().tags
def _post_put_hook(self, future):
"""Modify the associated Tag instances to reflect any updates
"""
old_tagset = set(getattr(self, '_tm_tags', []))
new_tagset = set(self.tags)
# These are tags that have changed
added_tags = new_tagset - old_tagset
deleted_tags = old_tagset - new_tagset
# Get the key for this post
self_key = future.get_result()
@ndb.transactional_tasklet
def update_changed(tag):
tag_instance = yield Tag.get_or_create_async(tag)
if tag in added_tags:
yield tag_instance.link_async(self_key)
else:
yield tag_instance.unlink_async(self_key)
ndb.Future.wait_all([
update_changed(tag) for tag in added_tags | deleted_tags
])
# Update for any successive puts on this model.
self._tm_tags = self.tagsUsage
Given a model with the mixin like this:
>>> class TagModel(TagMixin, ndb.Model):
... name = ndb.StringProperty() # just an arbitrary property
# And an instance, for illustration
>>> tm = TagModel()
# Getting and setting tags is done as normal
# for a property of an `ndb.Model`:
>>> TagModel(tags=["new", "hot"]).put()
>>> tm.tags = ["hot", "plasma"]
>>> tm.put()
>>> len(tm.tags)
2
# How many tags are there? "hot", "new", "plasma"
>>> Tag.query().count()
3
# Get the `ndb.Key` instances for models for a given tag:
>>> Tag.get_linked_by_tag("plasma")
ndb.Key('TagModel', 0)
# Query the tags by popularity:
>>> popular_tags = Tag.get_popular_query().fetch(3)
# Convert t list of `Tag` instances to strings with e.g.
>>> [t.tag for t in Tag.popular_tags]
['hot', 'plasma', 'new']It is straightforward to add for example getting recently updated or old tags
with: Tag.query().order(-Tag.modified)
The unit tests in the associated gist illustrate usage and expectations in better detail.
Careful Be mindful that the mixin uses the hooks _post_get_hook and
_post_put_hook. If your models also use these you will need a way to call them
e.g. calling from your hooks the respective
self.__class__.__bases__[0]._post_{get,put}_hook, where 0 corresponds to the index
of TagMixin in the __bases__. One could call the hooks for
all immediate ancestors with something like this:
def _post_put_hook(self, future):
# My hook stuff
for base in self.__class__.bases:
try:
other_hook = types.MethodType(
getattr(base, '_post_put_hook'), self, TaggedClass
)
except AttributeError:
continue
other_hook(future)If the hooks are buried deeper than immediate ancestors one would have to recurse through __class__.__bases__.
License
Code above is licensed under the MIT https://brianmhunt.mit-license.org/ license.