What is it?
This is a way to add tags functionality to classes derived from ndb.Model
on
Google App Engine by way of a mixin.
Why is it important?
Adding tags to data models is a popular phenomenon, but there does not appear to be a published way to accomplish the common functionality with ndb
. This method ought to be reusable and performant.
How does it work?
Get the full gist is gist.github.com.
This concept has two ndb.Model
derivates, the Tag
and the TagMixin
.
Design
The requirements here for me were as follows:
- The tags be stored with the model (i.e. can be accessed without “joins”)
- The
Tag
instances be updated automatically when a model is updated - The updates apply only on changed
Tag
instances - The update occur in parallel
I chose to store tags on the mixin as a repeated ndb.StringProperty
, with a separate Tag
model whose key can be generated from the tag (i.e. the tag blue
becomes ndb.Key('Tag', 'tag__blue')
). The Tag
instances are updated in parallel by tasklets
called from ndb
model hooks, as seem below.
The Tag
This Tag
class that keeps track of the popularity (count
) of a
tag and linked items.
from google.appengine.ext import ndb
MAX_TAGS_FOR_TAGGABLE = 1000
POPULAR_PAGE_SIZE = 30
class Tag(ndb.Model):
"""Keep track of data related to a tag added with the TagMixin class.
"""
tag = ndb.StringProperty(required=True, indexed=True)
count = ndb.IntegerProperty(default=0, indexed=True)
linked = ndb.KeyProperty(repeated=True)
created = ndb.DateTimeProperty(auto_now_add=True)
modified = ndb.DateTimeProperty(auto_now=True)
@staticmethod
def tag_to_keyname(tag):
return "tag__{}".format(tag.lower())
@staticmethod
def tag_to_key(tag):
return ndb.Key("Tag", Tag.tag_to_keyname(tag))
@classmethod
def get_linked_by_tag(self, tag, limit=MAX_TAGS_FOR_TAGGABLE):
"""Return the set of keys for the given tag"""
try:
return Tag.tag_to_key(tag).get().linked
except AttributeError:
return []
@classmethod
def get_or_create_async(cls, tag):
"""Return a future for a Tag instance for the given tag
"""
return Tag.get_or_insert_async(Tag.tag_to_keyname(tag), tag=tag)
@classmethod
def get_popular_query(cls, page_size=POPULAR_PAGE_SIZE):
return Tag.query().order(-Tag.count)
def unlink_async(self, key):
self.linked.remove(key)
self.count -= 1
return self.put_async()
def link_async(self, key):
self.linked.append(key)
self.count += 1
return self.put_async()
The TagMixin
The mixin that one would use to add tags to models is as follows:
class TagMixin(object):
"""A mixin that adds taggability to a class.
Adds a 'tags' property.
"""
tags = ndb.StringProperty(repeated=True, indexed=True,
validator=lambda p, v: v.lower())
def _post_get_hook(self, future):
"""Set the _tm_tags so we can compare for changes in pre_put
"""
self._tm_tags = future.get_result().tags
def _post_put_hook(self, future):
"""Modify the associated Tag instances to reflect any updates
"""
old_tagset = set(getattr(self, '_tm_tags', []))
new_tagset = set(self.tags)
# These are tags that have changed
added_tags = new_tagset - old_tagset
deleted_tags = old_tagset - new_tagset
# Get the key for this post
self_key = future.get_result()
@ndb.transactional_tasklet
def update_changed(tag):
tag_instance = yield Tag.get_or_create_async(tag)
if tag in added_tags:
yield tag_instance.link_async(self_key)
else:
yield tag_instance.unlink_async(self_key)
ndb.Future.wait_all([
update_changed(tag) for tag in added_tags | deleted_tags
])
# Update for any successive puts on this model.
self._tm_tags = self.tags
Usage
Given a model with the mixin like this:
>>> class TagModel(TagMixin, ndb.Model):
... name = ndb.StringProperty() # just an arbitrary property
# And an instance, for illustration
>>> tm = TagModel()
# Getting and setting tags is done as normal
# for a property of an `ndb.Model`:
>>> TagModel(tags=["new", "hot"]).put()
>>> tm.tags = ["hot", "plasma"]
>>> tm.put()
>>> len(tm.tags)
2
# How many tags are there? "hot", "new", "plasma"
>>> Tag.query().count()
3
# Get the `ndb.Key` instances for models for a given tag:
>>> Tag.get_linked_by_tag("plasma")
ndb.Key('TagModel', 0)
# Query the tags by popularity:
>>> popular_tags = Tag.get_popular_query().fetch(3)
# Convert t list of `Tag` instances to strings with e.g.
>>> [t.tag for t in Tag.popular_tags]
['hot', 'plasma', 'new']
It is straightforward to add for example getting recently updated or old tags
with: Tag.query().order(-Tag.modified)
The unit tests in the associated gist illustrate usage and expectations in better detail.
Careful Be mindful that the mixin uses the hooks _post_get_hook
and
_post_put_hook
. If your models also use these you will need a way to call them
e.g. calling from your hooks the respective
self.__class__.__bases__[0]._post_{get,put}_hook
, where 0
corresponds to the index
of TagMixin
in the __bases__
. One could call the hooks for
all immediate ancestors with something like this:
def _post_put_hook(self, future):
# My hook stuff
for base in self.__class__.bases:
try:
other_hook = types.MethodType(
getattr(base, '_post_put_hook'), self, TaggedClass
)
except AttributeError:
continue
other_hook(future)
If the hooks are buried deeper than immediate ancestors one would have to recurse through __class__.__bases__
.
License
Code above is licensed under the MIT https://brianmhunt.mit-license.org/ license.