TIL: Parse Platform: '.include()' on queries results in N+1 queries

2022-06-23 00:00:00 +0000 UTC

A project I support at work uses Parse Platform and MongoDB in its backend. Parse provides a NodeJS API for DB queries called ParseQuery that includes a method ParseQuery.include(). This method “includes nested Parse.Objects for the provided key.”

That might sound a little weird given the appeal of Mongo – instead of foreign key relations & joins, Mongo supports nested documents. If you have an one-to-one relationship between an application user and some domain object, you can simply include that domain object as a nested document, just like a JSON subobject. Similarly, if you have a one-to-many relationship from one domain object to another, you can just include an array of subdocuments. Then instead of explicitly joining tables like you might in a normal-form relational database, you can instead explicitly exclude fields from returned documents using projections.

However the subdocument method for domain modeling might not always work out. Parse Platform provides a structured way to impose a relational system onto Mongo. ParseObjects are the application-side abstraction for this ORM-like system. Each ParseObject has its own collection in Mongo. A ParseObject can include pointers (the term used by Parse for foreign keys) to objects in other collections. The .include(...) call instructs Parse to “dereference” those pointers. Because Mongo is not a relational database, there are two options for this procedure:

Perform a second Mongo find() query against the foreign collection using the pointers from the local document.
Rewrite our query as an aggregation pipeline using $lookup to perform the join.

Parse chooses option (1). This means each unique collection include()’d will result in another trip to the database. While (2) would more closely approximate the behavior of a relational database that can pull data from a foreign table in a single query through join, I am not sure it is possible or better for the software to transparently perform the rewrite (though it is certainly possible for a developer to rewrite their code to use a single aggregation). Here is how the single ParseQuery is transformed into multiple Mongo queries:

First, ParseQuery.include(...) in the JS SDK:

  include(...keys: Array<string | Array<string>>): ParseQuery {
    keys.forEach(key => {
      if (Array.isArray(key)) {
        this._include = this._include.concat(key);
      } else {
        this._include.push(key);
      }
    });
    return this;
  }

When we call include() in our query call chain, it just pushes the key into an internal array of included keys. So far so good. What does it do with this list? We can check ParseQuery.find(...) to see. I won’t reproduce the whole thing here, but it does not mention includes. However we can see it passes on a JSON representation of itself to a QueryController. The toJSON method includes []this relevant bit](https://github.com/parse-community/Parse-SDK-JS/blob/2f2a6bdb37f30e767a56a97b26bdd622b99700ac/src/ParseQuery.js#L436-L438):

    if (this._include.length) {
      params.include = this._include.join(',');
    }

Simple enough! We turn the list into a comma-separated string. So where does this get sent off to? We can see the default QueryController at the bottom of ParseQuery.js:

const DefaultController = {
  find(className: string, params: QueryJSON, options: RequestOptions): Promise<Array<ParseObject>> {
    const RESTController = CoreManager.getRESTController();
    return RESTController.request('GET', 'classes/' + className, params, options);
  },


  aggregate(className: string, params: any, options: RequestOptions): Promise<Array<mixed>> {
    const RESTController = CoreManager.getRESTController();


    return RESTController.request('GET', 'aggregate/' + className, params, options);
  },
};

So eventually this will be processed on the server as a RestQuery. There are a few intermediate steps but we eventually end up in a call to a method named handleInclude which calls out to includePath where we can find this snippet:

  const queryPromises = Object.keys(pointersHash).map(className => {
    const objectIds = Array.from(pointersHash[className]);
    let where;
    if (objectIds.length === 1) {
      where = { objectId: objectIds[0] };
    } else {
      where = { objectId: { $in: objectIds } };
    }
    var query = new RestQuery(config, auth, className, where, includeRestOptions);
    return query.execute({ op: 'get' }).then(results => {
      results.className = className;
      return Promise.resolve(results);
    });
  });

In short, the code to this point has generated a hashmap called pointersHash that indexes pointers (again, Parse’s name for foreign keys) by the class name (e.g. ParseObject type and therefore DB collection name). Parse finally maps over the keys to this map and generates one RestQuery per object we have included. These RestQuery objects eventually each produce a Mongo find query, taking their own trip to the database to produce a list of documents our original RestQuery adds as sub-objects in its eventual response.

Bottom line? Be careful running Parse.Query with .include()s. Depending on the use case, you might end up with a lot more trips to the database than you expected!

Tags: til parse mongo