From oracle8i onwards the inner query can have an
order by clause AND IS A REQUIREMENT IN THE SUB QUERY TO PERFORM the Top-N
analysis. Top-N queries are useful in scenarios where the need is to
display only the n top-most or the n bottom-most records from a table based on
a condition. This result set can be used for further analysis. For example, using Top-N analysis you can
perform the following types of queries:
The high-level structure of a top-n analysis query is: select [column_list], ROWNUM [ROWNUM_ALIAS] from (select [column_list] from table ORDER BY Top-N_column) where ROWNUM<=N; Performing “Top-N” Analysis Top-N queries use a consistent nested query structure with the elements described below:
The following is an example for performing
select ROWNUM as RANK, ename,sal from(select ename,sal from emp ORDER BY sal DESC) WHERE ROWNUM<=3; RANK ENAME SAL ---------- ---------- ---------- 1 KING 5000 2 SCOTT 3000 3 FORD 3000
select ROWNUM as RANK, ename,sal from(select ename,sal from emp ORDER BY sal) WHERE ROWNUM<=3; RANK ENAME SAL ---------- ---------- ---------- 1 SMITH 800 2 JAMES 950 3 ADAMS 1100
select ROWNUM as SENIORS, SENRS.ename,SENRS.hiredate from (select ename,hiredate from emp order by hiredate)SENRS where ROWNUM<=4; SENIORS ENAME HIREDATE ---------- ---------- --------- 1 SMITH 17-DEC-80 2 ALLEN 20-FEB-81 3 WARD 22-FEB-81 4 JONES 02-APR-81
select ROWNUM as SENIORS, SENRS.ename,SENRS.hiredate from (select ename,hiredate from emp order by hiredate DESC)SENRS where ROWNUM<=4; JUNIORS ENAME HIREDATE ---------- ---------- --------- 1 ADAMS 23-MAY-87 2 SCOTT 19-APR-87 3 MILLER 23-JAN-82 4 JAMES 03-DEC-81 SQL>select ROWNUM as SENIORS, SENRS.ename,SENRS.hiredate from (select ename,hiredate from emp order by hiredate)SENRS where ROWNUM<=4 order by rownum desc; SENIORS ENAME HIREDATE --------- ---------- --------- 4 JONES 02-APR-81 3 WARD 22-FEB-81 2 ALLEN 20-FEB-81 1 SMITH 17-DEC-80
It's common to run a query using only part of a dataset – for example, the top 100 salespeople in a company. In this article, we'll see how to use Oracle's Top-N query method to query just these rows. Top-N queries retrieve a defined number of rows (top or bottom) from a result set. In other words, they find the best or worst of something – the ten best selling cars in a certain region, the five most popular routers, the 20 worst-performing stores, etc. These kinds of queries are mostly used in Business Intelligence, where it's important to track the performance of certain entities and thus find ways of increasing profits or cutting costs. As database developer or BI developer, your role is to enable users to get results in a timely and dependable manner. In this article, we will explore some common ways of constructing Top-N queries in an Oracle database. First, we need to lay the groundwork for our queries by familiarizing ourselves with some basic analytical functions. If you aren't familiar with SQL functions, I recommend using Vertabelo Academy, particularly the Standard SQL Functions Course. Assuming you do have some background in SQL, let's get started by looking at the data in a simple sales table: This table contains the following information:
Now let's query this table and see what we have inside: SELECT * FROM sales ORDER BY amount DESC;Here is the result: Now suppose you want to query the top five sales, or the five biggest sales. Usually, people execute this query using the Oracle pseudocolumn ROWNUM. ROWNUM returns a number indicating the order that the row is selected from the table. Here's the code: SELECT * FROM sales WHERE rownum <=>But this is a mistake, as you can see from the result set: This is because ROWNUM is executed before the ORDER BY statement. To get around this, you use the inline view (a SELECT statement within the FROM clause of another SELECT, as shown below.) This tells Oracle to sort the data first. select * from (select * from sales order by amount desc) where rownum <=>And we get the correct result: However, there is one problem: how we define the best sale. Are the five best sales the first five distinct best? The five non-distinct best? Or the first five ranked sales? The definition of the best sales will come from the business side of things, but figuring out the solution is up to you. What Is a Top-N Query?Now that we understand the business problems and the data, let's move on to Top-N queries. I'll look at each of these query types:
Top-N queries don't have a special command or keyword. They utilize an ordered inline view, select results from that view, and then limit the number of rows using ROWNUM. Note for Oracle 12c users: Oracle 12c introduced a new clause, FETCH FIRST. This allows us to write Top-N queries by adding the FETCH FIRST clause at the end of a basic query. I've included examples of this method as well. This is the classic Top-N. It is non-distinct, meaning that there may be gaps in the ranking if two or more values are the same. The ROW_NUMBER analytical function returns a unique row for each returned row. We can take this query further than we can one with ROWNUM because of the windowing aspect of ROW_NUMBER. To learn more about analytical functions in Oracle, start here. The query is : SELECT * FROM ( SELECT sales.*, ROW_NUMBER() OVER (ORDER BY amount DESC) AS amount_dense_rank FROM sales ) WHERE amount_dense_rank <=>And the result we get is: Oracle 12c SyntaxTo get the same result using Oracle 12c, we simply write: SELECT amount FROM sales ORDER BY amount DESC FETCH FIRST 3 ROWS ONLY;And the result we get is: Top-N DistinctTo get a Top-N with distinct results (i.e. no gaps in the ranked results), we use the DENSE_RANK analytical function. DENSE_RANK is similar to RANK (which we will discuss below), but it returns without any gaps in the results. We would use the Top-N Distinct query type when we want to see all the values and all the corresponding rows. Here is an example of a Top-N query with DENSE_RANK: SELECT * FROM ( SELECT sales.*, DENSE_RANK() OVER (ORDER BY amount DESC) AS amount_dense_rank FROM sales ) WHERE amount_dense_rank <=>And the result: We see that the same amounts are given the same rank – there are multiple 1s, 2s, and 3s – but there are no gaps. We will receive the three highest distinct values. Top-N Non-Distinct with Ranked ValuesA Top-N non-distinct query with ranked values will return an N number of the highest values and all the corresponding rows. To implement this, we use the RANK analytical function, which produces a sequential rank for each distinct value in the specified window. Let's look at an example with three amounts: SELECT * FROM ( SELECT sales.*, RANK() OVER (ORDER BY amount DESC) AS amount_rank FROM sales ) WHERE amount_rank <=>And the results : The RANK function returns the same number for non-distinct values and skips the number of distinct values to keep the rank consistent. If we had three sales that all had amounts of 100, there would be three "first-place" values with the amount of 100 and the next rank would be four. Oracle 12c SyntaxYou can achieve the same ranking results using Oracle 12c's new syntax feature: SELECT amount FROM sales ORDER BY amount DESC FETCH FIRST 3 ROWS WITH TIES;And we get the same result: We've explored three simple methods of creating Top-N queries with analytical functions. And we've explored a new syntax available in the Oracle 12c database. While we've concentrated on finding the best performers, an inverse query that finds the worst performers is also possible. All you'd need to do is reverse the order of the analytical functions or inline views. Why not put in some practice with creating queries? I recommend the SQL Window Functions online course – the only interactive course for SQL analytical functions on the Internet. See what you can come up with! |