Skip to content

Accumulators

Accumulators are used within Pipeline.group() to compute aggregate values for each group. All accumulator classes live in gault.accumulators and are dataclasses that extend Accumulator.

from gault.accumulators import Sum, Avg, Count, Min, Max, First, Last, Push, AddToSet

Usage in group()

Accumulators are passed to Pipeline.group() as named expressions. Each accumulator must be aliased (given an output field name):

from gault import Pipeline, Field
from gault.accumulators import Sum, Avg, Count

# Dict form
Pipeline().group(
    {"total_sales": Sum("$amount"), "avg_price": Avg("$price"), "doc_count": Count()},
    by="$category",
)

# Spread Aliased form (using .alias())
Pipeline().group(
    Sum("$amount").alias("total_sales"),
    Avg("$price").alias("avg_price"),
    Count().alias("doc_count"),
    by="$category",
)

# Group all documents (by=None)
Pipeline().group({"grand_total": Sum("$amount")}, by=None)

All accumulators support the .alias(name) method inherited from AsAlias, which wraps the accumulator in an Aliased container for use with the spread form.


Accumulator (base class)

class Accumulator(ABC, AsAlias)

Abstract base class for all accumulators. Subclasses must implement:

def compile_expression(self, *, context: Context) -> MongoExpression

Sum

@dataclass
class Sum(Accumulator)

Returns the sum of numeric values. Ignores non-numeric values.

Parameter Type Description
input NumberExpression Expression that resolves to a number. Use 1 to count documents.
Sum("$amount")
# Compiles to: {"$sum": "$amount"}

Sum(1)
# Compiles to: {"$sum": 1}

Example:

Pipeline().group({"total": Sum("$price")}, by="$category")
# {"$group": {"_id": "$category", "total": {"$sum": "$price"}}}


Avg

@dataclass
class Avg(Accumulator)

Returns the average of numeric values.

Parameter Type Description
input NumberExpression Expression that resolves to a number.
Avg("$score")
# {"$avg": "$score"}

Example:

Pipeline().group({"avg_score": Avg("$score")}, by="$class")


Count

@dataclass
class Count(Accumulator)

Returns the number of documents in a group. Takes no parameters.

Count()
# {"$count": {}}

Example:

Pipeline().group({"total": Count()}, by="$status")


Min

@dataclass
class Min(Accumulator)

Returns the minimum value.

Parameter Type Description
input NumberExpression Expression that resolves to a comparable value.
Min("$price")
# {"$min": "$price"}

Max

@dataclass
class Max(Accumulator)

Returns the maximum value.

Parameter Type Description
input NumberExpression Expression that resolves to a comparable value.
Max("$price")
# {"$max": "$price"}

First

@dataclass
class First(Accumulator)

Returns the value from the first document in each group. Order depends on the preceding $sort stage.

Parameter Type Description
input NumberExpression Expression to evaluate.
First("$name")
# {"$first": "$name"}

Example:

Pipeline().sort({"date": -1}).group({"latest_name": First("$name")}, by="$category")


Last

@dataclass
class Last(Accumulator)

Returns the value from the last document in each group.

Parameter Type Description
input NumberExpression Expression to evaluate.
Last("$name")
# {"$last": "$name"}

FirstN

@dataclass
class FirstN(Accumulator)

Returns the first n values in each group.

Parameter Type Description
input NumberExpression Expression to evaluate.
n int Number of values to return.
FirstN("$name", n=3)
# {"$firstN": {"input": "$name", "n": 3}}

LastN

@dataclass
class LastN(Accumulator)

Returns the last n values in each group.

Parameter Type Description
input NumberExpression Expression to evaluate.
n int Number of values to return.
LastN("$name", n=3)
# {"$lastN": {"input": "$name", "n": 3}}

Push

@dataclass
class Push(Accumulator)

Returns an array of all values for each group (including duplicates).

Parameter Type Description
input AnyExpression Expression to evaluate.
Push("$tag")
# {"$push": "$tag"}

Example:

Pipeline().group({"all_tags": Push("$tag")}, by="$category")


AddToSet

@dataclass
class AddToSet(Accumulator)

Returns an array of unique values for each group.

Parameter Type Description
input NumberExpression Expression to evaluate.
AddToSet("$tag")
# {"$addToSet": "$tag"}

Example:

Pipeline().group({"unique_tags": AddToSet("$tag")}, by="$category")


Top

@dataclass
class Top(Accumulator)

Returns the top element within a group according to a sort order.

Parameter Type Description
sort_by SortPayload Sort specification.
output AnyExpression \| list[AnyExpression] Expression(s) to return.
Top(sort_by={"score": -1}, output="$name")
# {"$top": {"sortBy": {"score": -1}, "output": "$name"}}

TopN

@dataclass
class TopN(Accumulator)

Returns the top n elements within a group.

Parameter Type Description
n int Number of elements.
sort_by SortPayload Sort specification.
output AnyExpression \| list[AnyExpression] Expression(s) to return.
TopN(n=5, sort_by={"score": -1}, output="$name")
# {"$topN": {"n": 5, "sortBy": {"score": -1}, "output": "$name"}}

Bottom

@dataclass
class Bottom(Accumulator)

Returns the bottom element within a group according to a sort order.

Parameter Type Description
sort_by SortPayload Sort specification.
output AnyExpression \| list[AnyExpression] Expression(s) to return.
Bottom(sort_by={"score": 1}, output="$name")
# {"$bottom": {"sortBy": {"score": 1}, "output": "$name"}}

BottomN

@dataclass
class BottomN(Accumulator)

Returns the bottom n elements within a group.

Parameter Type Description
n int Number of elements.
sort_by SortPayload Sort specification.
output AnyExpression \| list[AnyExpression] Expression(s) to return.
BottomN(n=3, sort_by={"score": 1}, output=["$name", "$score"])
# {"$bottomN": {"n": 3, "sortBy": {"score": 1}, "output": ["$name", "$score"]}}

MinN

@dataclass
class MinN(Accumulator)

Returns the n minimum valued elements within a group.

Parameter Type Description
input NumberExpression Expression to evaluate.
n int Number of minimum values.
MinN("$price", n=3)
# {"$minN": {"input": "$price", "n": 3}}

MaxN

@dataclass
class MaxN(Accumulator)

Returns the n maximum valued elements within a group.

Parameter Type Description
input NumberExpression Expression to evaluate.
n int Number of maximum values.
MaxN("$price", n=3)
# {"$maxN": {"input": "$price", "n": 3}}

Median

@dataclass
class Median(Accumulator)

Returns an approximation of the median value. Uses the "approximate" method.

Parameter Type Description
input NumberExpression Expression that resolves to a number.
Median("$score")
# {"$median": {"input": "$score", "method": "approximate"}}

Percentile

@dataclass
class Percentile(Accumulator)

Returns an approximation of percentile values. Uses the "approximate" method.

Parameter Type Description
input NumberExpression Expression that resolves to a number.
p list[float] Percentile values between 0.0 and 1.0 inclusive.
Percentile("$score", p=[0.25, 0.5, 0.75])
# {"$percentile": {"input": "$score", "p": [0.25, 0.5, 0.75], "method": "approximate"}}

StdDevPop

@dataclass
class StdDevPop(Accumulator)

Returns the population standard deviation of the input values.

Parameter Type Description
input NumberExpression Expression that resolves to a number.
StdDevPop("$score")
# {"$stdDevPop": "$score"}

StdDevSamp

@dataclass
class StdDevSamp(Accumulator)

Returns the sample standard deviation of the input values.

Parameter Type Description
input NumberExpression Expression that resolves to a number.
StdDevSamp("$score")
# {"$stdDevSamp": "$score"}

MergeObjects

@dataclass
class MergeObjects(Accumulator)

Combines multiple documents into a single document. When used as a group accumulator, merges all documents in the group.

Parameter Type Description
input ObjectExpression Expression that resolves to a document.
MergeObjects("$metadata")
# {"$mergeObjects": "$metadata"}

Example:

Pipeline().group({"combined": MergeObjects("$details")}, by="$category")


Quick reference table

Accumulator MongoDB operator Parameters Description
Sum $sum input Sum of numeric values
Avg $avg input Average of numeric values
Count $count (none) Document count
Min $min input Minimum value
Max $max input Maximum value
First $first input First value in group
Last $last input Last value in group
FirstN $firstN input, n First N values
LastN $lastN input, n Last N values
Push $push input Array of all values
AddToSet $addToSet input Array of unique values
Top $top sort_by, output Top element by sort
TopN $topN n, sort_by, output Top N elements by sort
Bottom $bottom sort_by, output Bottom element by sort
BottomN $bottomN n, sort_by, output Bottom N elements by sort
MinN $minN input, n N minimum values
MaxN $maxN input, n N maximum values
Median $median input Approximate median
Percentile $percentile input, p Approximate percentiles
StdDevPop $stdDevPop input Population std deviation
StdDevSamp $stdDevSamp input Sample std deviation
MergeObjects $mergeObjects input Merge documents