Aggregation Framework

Estimated time to read: 3 minutes

The Aggregation Framework in its simplest form is another way to query data in MongoDB. Everything that can be done via the MQL can also be done via the Aggregation Framework.

Syntax¶

Find all documents that have Wifi as one of the amenities. Only including price and address in the resulting cursor.

Using MQL, the following command could be used:

db.listingsAndReviews.find(
    {"amenities": "Wifi"},
    {"price": 1, "address": 1, "_id":0}
).pretty()

Using the Aggregation Framework, the following command could be used to produce the same results:

db.listingsAndReviews.aggregate(
    [
        { $match: {"amenities": "Wifi" } },
        { $project: { "price": 1, "address": 1, "_id": 0 } }
    ]
)

The usage of the aggregate command allows for grouping and aggregation of documents. It also allows operations other than finding and projecting data, it can also be used for counting documents.

Syntax Breakdown¶

db.listingsAndReviews.aggregate(
    [
        { $match: {"amenities": "Wifi" } },
        { $project: { "price": 1, "address": 1, "_id": 0 } }
    ]
)

The aggregate command is used to tell MongoDB to use the specified method.
An array is then created within the aggregation command.
Remember that the order of elements is important! Elements are accessed by knowing their position in the array.
The Aggregation Framework works as a pipeline, where the order of actions in the pipeline matters. Each action is executed in the order listed.
The transformed data is then returned at the end of the pipeline as an output.

Pipeline¶

Looking at the syntax above, it can be read as two separate filters.

The first filter is the $match stage. This acts as a filter that keeps all the amenities without Wi-Fi from passing through to the next stage of the pipeline.

The second filter is the $project stage. This acts as a filter to filter out all of the files that are not address or price from each document. It must be even finer filter than the first.

For each stage, specify what you want to do and pass this into the next stage to be acted upon.

Beyond MQL using $group¶

There are many stages that differentiate between MQL and the Aggregation Framework. One of these is the $group stage. With MQL, data can be filtered or updated. With the Aggregation Framework, data can be computed and reshaped.

$group is an operators that takes the incoming stream of data and siphons into multiple distinct reservoirs.

%%{init: {'flowchart' : {'curve' : 'cardinal'}}}%%
flowchart 

    data(Data)

    match[$match]

    group{$group}

    A[Data A]
    B[Data B]
    C[Data C]
    D[Data D]

    data ==> match
    match ==> group

    group --> A
    group --> B
    group --> C
    group --> D

Note

The non-filtering stages within the Aggregation Framework do not modify the original data when they do summaries, calculations and groupings of data. Instea, they work with the data that they get from the previous stage in the pipeline, which is in its own cursor.

$group Syntax¶

{ 
    "$group": {
             "_id": <expression>, // Group by expression. Identifies the group that the document belongs to.
             <field1>: { < accumulator1> : <expression1> }
             }
}

The first part of the $group syntax allows for identifying a group that the document belongs to. This can be done using dot-notation.

The second part of the $group syntax allows for more quantitative analysis of the data that is being passed through the pipeline.

$group Example¶

Find one document in the collection and only include the address field in the resulting cursor.

db.listingsAndReviews.findOne({ },{ "address": 1, "_id": 0 })

Project only the address field value for each document, then group all documents into one document per address.country value.

db.listingsAndReviews.aggregate([ { "$project": { "address": 1, "_id": 0 }},
                                  { "$group": { "_id": "$address.country" }}])

Project only the address field value for each document, then group all documents into one document per address.country value, and count one for each document in each group.

db.listingsAndReviews.aggregate([
                                  { "$project": { "address": 1, "_id": 0 }},
                                  { "$group": { "_id": "$address.country",
                                                "count": { "$sum": 1 } } }
                                ])

In the example above, the $group command does the following:

{
     "$group": // Using the $group command
        { 
            "_id": "$address.country", // Identify a group that the document belongs to within the collection
            "count": { "$sum": 1 } // Create new field, named sum, for documents that are created in the pipeline.
                                   // $sum is then used to add 1 to each document that folds into each group
        } 
}

$group + $sum Breakdown¶

{
    "$group":
        "_id": "Category",
        "total": { "$sum": "$price" }
}

{
    {
        "category": "fish",
        "price": 5
    },
       {
        "category": "meat",
        "price": 25
    },
    {
        "category": "fish",
        "price": 7
    }
}

{
    {
        "_id": "fish".
        "total": 12
    },
    {
        "_id": "meat".
        "total": 25
    }
}