Group Definition
Typically, there are as many groups as there are unique values in the grouping variable.
(A categorical array also can include categories that are not represented in the data.) The
groups and the order of the groups depend on the data type of the grouping variable.
- For numeric, logical, datetime, or duration vectors, or cell arrays of character
vectors, the groups correspond to the unique values sorted in ascending order. - For categorical arrays, the groups correspond to the unique values observed in the
array, sorted in the order returned by the categories function.
The findgroups function can accept multiple grouping variables, for example G =
findgroups(A1,A2). You also can include multiple grouping variables in a table, for
example T = table(A1,A2); G = findgroups(T). The findgroups function defines
groups by the unique combinations of values across corresponding elements of the
grouping variables. findgroups decides the order by the order of the first grouping
variable, and then by the order of the second grouping variable, and so on. For example,
if A1 = {'a','a','b','b'} and A2 = [0 1 0 0], then the unique values across the
grouping variables are 'a' 0, 'a' 1, and 'b' 0, defining three groups.
The Split-Apply-Combine Workflow
After you select grouping variables and split data variables into groups, you can apply
functions to the groups and combine the results. This workflow is called the Split-Apply-
Combine workflow. You can use the findgroups and splitapply functions together to
analyze groups of data in this workflow. This diagram shows a simple example using the
grouping variable Gender and the data variable Height to calculate the mean height by
gender.
The findgroups function returns a vector of group numbers that define groups based on
the unique values in the grouping variables. splitapply uses the group numbers to split
the data into groups efficiently before applying a function.
9 Tables