Hello.
Question on sql
under the GROUP BY
clause.
-
Consider grouping by ONE column. Example:
SELECT DEPARTMENT_ID, SUM (SALARY) FROM Employees GROUP BY DEPARTMENT_ID;
That is, the
DEPARTMENT_ID
column looks for a unique (similar toDISTINCT
) department value, for example, 30, then all rows that mention department 30 in this table, from these rows the values from theSALARY
column are taken and are summed (SUM
). Then a other buyer is searched for and everything is repeated. As a result, I get how much money each department received in general.I don’t understand the moment: I have 6 rows in which there is a
DEPARTMENT_ID
column with a value of 30. Which of the rows will go to the table –SELECT
and why? That is, theEmployees
table had six rows withDEPARTMENT_ID
30, while theSELECT
table had only one . How does this grouping work in general? -
Consider grouping by two columns. I don’t understand her at all. I could not even find a normal picture, from which it would be clear. Looked through a bunch of articles and books on this subject, but did not understand anything.
Answer 1, authority 100%
I will add with an example of queries and output GROUP BY in two fields.
You can look at the table in which, for example, it is saved which user (user_id) contributed money, to which account (account) and how much (balance).
For example, you need to know how much each user has contributed to each of their accounts.
SELECT MIN (user_id), MIN (account), SUM (balance) FROM `t1` GROUP BY user_id, account ;
GROUP BY works on two fields as well as on one, first sorts, and then looks, if both values in the line are the same as in the previous line, then it groups these lines. If at least one value is not the same as in the previous line, then there will be no grouping. For 3 or more fields, GROUP BY works the same.
Result:
Answer 2, authority 95%
None of the original lines will be included in the selection after group by
. At the output, the aggregate is the sum of the data in the desired section. Columns that you have not explicitly applied any group functions to (such as sum ()
) will be assigned a first-found function. And only in MySQL and only with the ONLY_FULL_GROUP_BY
option disabled. In the rest of the DBMS, a query in which at least one column that is not a cut specified in the group by “forgot” to apply the group function will generate an error.
How group by
works can be estimated in Excel. Write the data to the sheet, sort by the fields that should be in group by
. Reading sorted data in a row in any case when the value in the next row in the columns specified in group by
differs from the values in the previous one – insert a new row, copy the values of the columns group by
, and in the rest place formulas like SUM ()
the cells of the group under which the summarizes. The result of the group by
is exactly these inserted totals. The DBMS works according to approximately the same algorithm – first it sorts, then sums up consecutive identical records.
I will add about MySQL – he is still too loose about this. Try to always explicitly apply group functions to all columns in order to understand what exactly will appear in them, because the ‘first one’ is not standardized and can vary from version to version and depending on the physical location of the records on disk and the query execution plan.