Brian M Hunt
Splash by picsum.photos
Google App Engine tags for ndb
Apr 4, 2014

What is it?

This is a way to add tags functionality to classes derived from ndb.Model on Google App Engine by way of a mixin.

Why is it important?

Adding tags to data models is a popular phenomenon, but there does not appear to be a published way to accomplish the common functionality with ndb. This method ought to be reusable and performant.

How does it work?

Get the full gist is gist.github.com.

This concept has two ndb.Model derivates, the Tag and the TagMixin.

Design

The requirements here for me were as follows:

  1. The tags be stored with the model (i.e. can be accessed without “joins”)
  2. The Tag instances be updated automatically when a model is updated
  3. The updates apply only on changed Tag instances
  4. The update occur in parallel

I chose to store tags on the mixin as a repeated ndb.StringProperty, with a separate Tag model whose key can be generated from the tag (i.e. the tag blue becomes ndb.Key('Tag', 'tag__blue')). The Tag instances are updated in parallel by tasklets called from ndb model hooks, as seem below.

The Tag

This Tag class that keeps track of the popularity (count) of a tag and linked items.

from google.appengine.ext import ndb

MAX_TAGS_FOR_TAGGABLE = 1000
POPULAR_PAGE_SIZE = 30


class Tag(ndb.Model):
    """Keep track of data related to a tag added with the TagMixin class.
    """
    tag = ndb.StringProperty(required=True, indexed=True)
    count = ndb.IntegerProperty(default=0, indexed=True)
    linked = ndb.KeyProperty(repeated=True)
    created = ndb.DateTimeProperty(auto_now_add=True)
    modified = ndb.DateTimeProperty(auto_now=True)

    @staticmethod
    def tag_to_keyname(tag):
        return "tag__{}".format(tag.lower())

    @staticmethod
    def tag_to_key(tag):
        return ndb.Key("Tag", Tag.tag_to_keyname(tag))

    @classmethod
    def get_linked_by_tag(self, tag, limit=MAX_TAGS_FOR_TAGGABLE):
        """Return the set of keys for the given tag"""
        try:
            return Tag.tag_to_key(tag).get().linked
        except AttributeError:
            return []

    @classmethod
    def get_or_create_async(cls, tag):
        """Return a future for a Tag instance for the given tag
        """
        return Tag.get_or_insert_async(Tag.tag_to_keyname(tag), tag=tag)

    @classmethod
    def get_popular_query(cls, page_size=POPULAR_PAGE_SIZE):
        return Tag.query().order(-Tag.count)

    def unlink_async(self, key):
        self.linked.remove(key)
        self.count -= 1
        return self.put_async()

    def link_async(self, key):
        self.linked.append(key)
        self.count += 1
        return self.put_async()

The TagMixin

The mixin that one would use to add tags to models is as follows:

class TagMixin(object):
    """A mixin that adds taggability to a class.

    Adds a 'tags' property.
    """
    tags = ndb.StringProperty(repeated=True, indexed=True,
                              validator=lambda p, v: v.lower())

    def _post_get_hook(self, future):
        """Set the _tm_tags so we can compare for changes in pre_put
        """
        self._tm_tags = future.get_result().tags

    def _post_put_hook(self, future):
        """Modify the associated Tag instances to reflect any updates
        """
        old_tagset = set(getattr(self, '_tm_tags', []))
        new_tagset = set(self.tags)

        # These are tags that have changed
        added_tags = new_tagset - old_tagset
        deleted_tags = old_tagset - new_tagset

        # Get the key for this post
        self_key = future.get_result()

        @ndb.transactional_tasklet
        def update_changed(tag):
            tag_instance = yield Tag.get_or_create_async(tag)
            if tag in added_tags:
                yield tag_instance.link_async(self_key)
            else:
                yield tag_instance.unlink_async(self_key)

        ndb.Future.wait_all([
            update_changed(tag) for tag in added_tags | deleted_tags
        ])

        # Update for any successive puts on this model.
        self._tm_tags = self.tags

Usage

Given a model with the mixin like this:

>>> class TagModel(TagMixin, ndb.Model):
...    name = ndb.StringProperty()  # just an arbitrary property

# And an instance, for illustration
>>> tm = TagModel()

# Getting and setting tags is done as normal
# for a property of an `ndb.Model`:
>>> TagModel(tags=["new", "hot"]).put()

>>> tm.tags = ["hot", "plasma"]
>>> tm.put()
>>> len(tm.tags)
2

# How many tags are there? "hot", "new", "plasma"
>>> Tag.query().count()
3

# Get the `ndb.Key` instances for models for a given tag:
>>> Tag.get_linked_by_tag("plasma")
ndb.Key('TagModel', 0)

# Query the tags by popularity:
>>> popular_tags = Tag.get_popular_query().fetch(3)

# Convert t list of `Tag` instances to strings with e.g.
>>> [t.tag for t in Tag.popular_tags]
['hot', 'plasma', 'new']

It is straightforward to add for example getting recently updated or old tags with: Tag.query().order(-Tag.modified)

The unit tests in the associated gist illustrate usage and expectations in better detail.

Careful Be mindful that the mixin uses the hooks _post_get_hook and _post_put_hook. If your models also use these you will need a way to call them e.g. calling from your hooks the respective self.__class__.__bases__[0]._post_{get,put}_hook, where 0 corresponds to the index of TagMixin in the __bases__. One could call the hooks for all immediate ancestors with something like this:

  def _post_put_hook(self, future):
    # My hook stuff
    for base in self.__class__.bases:
      try:
        other_hook = types.MethodType(
           getattr(base, '_post_put_hook'), self, TaggedClass
        )
      except AttributeError:
        continue
      other_hook(future)

If the hooks are buried deeper than immediate ancestors one would have to recurse through __class__.__bases__.

License

Code above is licensed under the MIT https://brianmhunt.mit-license.org/ license.

See also