1. What we can do with analytical functions can be done through simple joins and subqueries, but aggregate functions are always faster or atleast equal to native SQL queries.
2. The “group by” works similar to the aggregate function, but the “non-group by” columns are not allowed in the query. In aggregate func, this restriction is omitted and we can see the non-group by data also and the grouped value is repeated for all the results.
select student_class, count(*) cnt from students
where class in ('6A','6B')
group by class
select student_class, house, count(*) over (partition by student_class) cnt from students
where class in ('CF','CHILD')
3. The Analytical functions are computed after all the where, join, group by and having clauses are complete and before the order by clause. So, the analytical functions can appear only in the select clause and main order by clause of the query.
4. If the condition inside the OVER() clause is left empty, the analytical functions works on the entire resultset. We can use any non-analytical function in the OVER() clause.
select status, ext_batch_id, count(*) over () cnt from gefi_batch_header
where status in ('CF','CHILD')
5. The functions SUM, COUNT, AVG, MIN, MAX are the common analytic functions the result of which does not depend on the order of the records.
Functions like LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE depends on order of records. In the next example we will see how to specify that. We can order the records using the ORDER BY clause inside the OVER() clause.
Now , we will explore each of the above mentioned analytical functions in detail:
ROW_NUMBER, RANK and DENSE_RANK
The order by clause of the row_number() function, does not need to be in the select part of the query. This is not possible in the normal order by clause used in the query.
select status, ext_batch_id, row_number() over (partition by status order by batch_id) cnt from gefi_batch_header
where status in ('CF','CHILD')
You can even omit the partitioning clause and still use the order by in the OVER() clause.
select status, ext_batch_id, row_number() over (order by batch_id) cnt from gefi_batch_header
where status in ('CF','CHILD')
RANK and DENSE_RANK, both provide a rank to the resultset rows, but there is a slight difference in the 2 approaches:
SELECT empno, deptno, sal,
RANK() OVER (PARTITION BY deptno
ORDER BY sal DESC NULLS LAST) RANK,
DENSE_RANK() OVER (PARTITION BY
deptno ORDER BY sal DESC NULLS
LAST) DENSE_RANK
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, RANK;
EMPNO DEPTNO SAL RANK DENSE_RANK
------ ------- ----- ----- ----------
7839 10 5000 1 1
7782 10 2450 2 2
7934 10 1300 3 3
7788 20 3000 1 1
7902 20 3000 1 1
7566 20 2975 3 2
7876 20 1100 4 3
7369 20 800 5 4
8 rows selected.
|
As you can see from the query and its result, the RANK skips the rank number, if it finds 2 results with the same rank level. The next in order is assigned the next possible rank. But in case if DENSE_RANK, the values are not skipped and hence it is dense[may be].