Home » SQL & PL/SQL » SQL & PL/SQL » how to get the rows of different values of single column
how to get the rows of different values of single column [message #684197] |
Wed, 21 April 2021 07:14  |
 |
Sekhar6617
Messages: 22 Registered: March 2021
|
Junior Member |
|
|
Hi Team,
Requesting you please help me with the query for selecting rows with different values of single column.
Below is the scenario explained with example. In this below example PO_ID = 2675 should be returned as a output since it has two distinct date fields.
CREATE TABLE PO_TBL (
PO_ID int,
DATE DATE,
AMOUNT INT,
Address varchar(255),
City varchar(255)
);
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 1023.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 125.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/25/2021', 78.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (5360, '02/01/2021', 231.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (5360, '02/01/2021', 349.00, 'TEXAS', 'TEXAS');
OUTPUT:
PO_ID DATE
2675 02/01/2021
2675 02/25/2021
I have tried with below sql query but results are not as expected.
SELECT A.PO_ID, A.DATE, COUNT(A.DATE) FROM PO_TBL A
WHERE A.DATE NOT IN
(SELECT DISTINCT B.DATE FROM PO_TBL B
WHERE A.PO_ID = B.PO_ID
AND A.DATE <> B.DATE)
GROUP BY A.PO_ID, A.DATE
Thank you.
Regards
Sekhar
|
|
|
Re: how to get the rows of different values of single column [message #684198 is a reply to message #684197] |
Wed, 21 April 2021 07:54   |
 |
Michel Cadot
Messages: 68421 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
Always post your Oracle version, with 4 decimals (query v$version), as often solution depends on it.
DATE is not a valid column name:
SQL> CREATE TABLE PO_TBL (
2 PO_ID int,
3 DATE DATE,
4 AMOUNT INT,
5 Address varchar(255),
6 City varchar(255)
7 );
DATE DATE,
*
ERROR at line 3:
ORA-00904: : invalid identifier
SQL> CREATE TABLE PO_TBL (
2 PO_ID int,
3 my_DATE DATE,
4 AMOUNT INT,
5 Address varchar(255),
6 City varchar(255)
7 );
Table created.
SQL> INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS')
*
ERROR at line 1:
ORA-01843: not a valid month
ALWAYS use TO_DATE with a format mask to insert a date from a string.
Also note: how Oracle would know if 02/01 is January, 2nd or February, 1st?
Correct syntax is:
SQL> INSERT INTO PO_TBL VALUES (2675, TO_DATE('02/01/2021','MM/DD/YYYY'), 665.00, 'TEXAS', 'TEXAS');
1 row created.
Now for the question:
SQL> select * from PO_TBL order by 1, 2;
PO_ID MY_DATE AMOUNT ADDRESS CITY
---------- ------------------- ---------- ---------- ----------
2675 01/02/2021 00:00:00 665 TEXAS TEXAS
2675 01/02/2021 00:00:00 1023 TEXAS TEXAS
2675 01/02/2021 00:00:00 125 TEXAS TEXAS
2675 25/02/2021 00:00:00 78 TEXAS TEXAS
5360 01/02/2021 00:00:00 231 TEXAS TEXAS
5360 01/02/2021 00:00:00 349 TEXAS TEXAS
6 rows selected.
SQL> select distinct po_id, my_date
2 from po_tbl
3 where po_id in ( select po_id from po_tbl
4 group by po_id
5 having count(distinct my_date) > 1 )
6 order by po_id, my_date
7 /
PO_ID MY_DATE
---------- -------------------
2675 01/02/2021 00:00:00
2675 25/02/2021 00:00:00
2 rows selected.
|
|
|
|
Re: how to get the rows of different values of single column [message #684200 is a reply to message #684198] |
Wed, 21 April 2021 12:27   |
John Watson
Messages: 8805 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
Here's another solution:orclz> select distinct p.po_id,p.my_Date from po_tbl p where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id);
PO_ID MY_DATE
--------------- -------------------
2675 2021-02-25:00:00:00
2675 2021-02-01:00:00:00
I love finding a reason to use the ANY or ALL operators. On my system, the cost is lower than Michel's query:orclz> set autot trace exp
orclz> select distinct p.po_id,p.my_Date from po_tbl p where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id);
Execution Plan
----------------------------------------------------------
Plan hash value: 2711566224
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 264 | 7 (15)| 00:00:01 |
| 1 | HASH UNIQUE | | 6 | 264 | 7 (15)| 00:00:01 |
|* 2 | HASH JOIN SEMI | | 6 | 264 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| PO_TBL | 6 | 132 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| PO_TBL | 6 | 132 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PO_ID"="Q"."PO_ID")
filter("P"."MY_DATE"<>"Q"."MY_DATE")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
orclz>
orclz> select distinct po_id, my_date
2 from po_tbl
3 where po_id in ( select po_id from po_tbl
4 group by po_id
5 having count(distinct my_date) > 1 )
6 order by po_id, my_date;
Execution Plan
----------------------------------------------------------
Plan hash value: 4065634225
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 1008 | 8 (25)| 00:00:01 |
| 1 | SORT UNIQUE NOSORT | | 18 | 1008 | 8 (25)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | SORT GROUP BY | | 18 | 1008 | 8 (25)| 00:00:01 |
|* 4 | HASH JOIN | | 18 | 1008 | 7 (15)| 00:00:01 |
| 5 | VIEW | VM_NWVW_2 | 6 | 132 | 4 (25)| 00:00:01 |
| 6 | HASH GROUP BY | | 6 | 132 | 4 (25)| 00:00:01 |
| 7 | TABLE ACCESS FULL| PO_TBL | 6 | 132 | 3 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | PO_TBL | 6 | 204 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(COUNT("$vm_col_1")>1)
4 - access("PO_ID"="$vm_col_2")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
orclz>
|
|
|
Re: how to get the rows of different values of single column [message #684204 is a reply to message #684200] |
Thu, 22 April 2021 08:31   |
Solomon Yakobson
Messages: 3214 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Single path through PO_TBL:
WITH T AS (
SELECT PO_ID,
MY_DATE,
COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
FROM PO_TBL
)
SELECT PO_ID,
MY_DATE
FROM T
WHERE CNT > 1
AND RN = 1
/
PO_ID MY_DATE
---------- ----------
2675 02/01/2021
2675 02/25/2021
Execution Plan
----------------------------------------------------------
Plan hash value: 118720869
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 288 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 6 | 288 | 4 (25)| 00:00:01 |
| 2 | WINDOW SORT | | 6 | 132 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| PO_TBL | 6 | 132 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CNT">1 AND "RN"=1)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SQL>
SY.
|
|
|
Re: how to get the rows of different values of single column [message #684206 is a reply to message #684204] |
Thu, 22 April 2021 12:24   |
 |
Michel Cadot
Messages: 68421 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
I hesitated to post a solution with analytic functions because this is one of the cases where they are most often less efficient.
If you create an index on the table like:
SQL> create index PO_TBL_idx on PO_TBL (po_id,my_date) compress 1;
Index created.
Then the previous queries use the index when the latest one does not (I removed the costs as they are meaningless for such a small table):
SQL> set autot trace exp
SQL> select distinct p.po_id,p.my_Date
2 from po_tbl p
3 where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id)
4 /
Execution Plan
----------------------------------------------------------
Plan hash value: 1247696620
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH UNIQUE | |
|* 2 | HASH JOIN | |
| 3 | INDEX FULL SCAN| PO_TBL_IDX |
| 4 | INDEX FULL SCAN| PO_TBL_IDX |
----------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PO_ID"="Q"."PO_ID")
filter("P"."MY_DATE"<>"Q"."MY_DATE")
SQL> WITH T AS (
2 SELECT PO_ID,
3 MY_DATE,
4 COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
5 ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
6 FROM PO_TBL
7 )
8 SELECT PO_ID,
9 MY_DATE
10 FROM T
11 WHERE CNT > 1
12 AND RN = 1
13 /
Execution Plan
----------------------------------------------------------
Plan hash value: 118720869
--------------------------------------
| Id | Operation | Name |
--------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | VIEW | |
| 2 | WINDOW SORT | |
| 3 | TABLE ACCESS FULL| PO_TBL |
--------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CNT">1 AND "RN"=1)
Then index should be far smaller than the table (about 10 bytes per row in the index over several hundred for the table).
Of course, best query depends on the table cardinality and data.
(Note: I used a 11.2.0.4 db for this, maybe a later version knows to use the index with analytic functions.)
|
|
|
|
|
Re: how to get the rows of different values of single column [message #684209 is a reply to message #684208] |
Thu, 22 April 2021 16:43   |
Solomon Yakobson
Messages: 3214 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Michel, reason why John's solution is using index is simple - optimizer understands where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id) will not produce TRUE for rows where PO_ID is null or MY_DATE is null. And, BTW, it makes this solution a non-working solution for NULL PO_ID, althought most likely it is PK. Anyway, if we want analytics to take advantage of indexing PO_ID, MY_DATE and both are nullable then all we need is either add where po_id is not null and my_date is not null (but that's again could be an issue if PO_ID can be NULL) or add a constant to index to make sure all rows are indexed:
SQL> CREATE INDEX PO_TBL_IDX ON PO_TBL(PO_ID,MY_DATE);
Index created.
SQL> SET AUTOTRACE TRACE EXP
SQL> WITH T AS (
2 SELECT PO_ID,
3 MY_DATE,
4 COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
5 ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
6 FROM PO_TBL
7 WHERE PO_ID IS NOT NULL
8 AND MY_DATE IS NOT NULL
9 )
10 SELECT PO_ID,
11 MY_DATE
12 FROM T
13 WHERE CNT > 1
14 AND RN = 1
15 /
Execution Plan
----------------------------------------------------------
Plan hash value: 1314444089
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 288 | 1 (0)| 00:00:01 |
|* 1 | VIEW | | 6 | 288 | 1 (0)| 00:00:01 |
| 2 | WINDOW BUFFER | | 6 | 132 | 1 (0)| 00:00:01 |
|* 3 | INDEX FULL SCAN| PO_TBL_IDX | 6 | 132 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CNT">1 AND "RN"=1)
3 - filter("PO_ID" IS NOT NULL AND "MY_DATE" IS NOT NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SQL> SET AUTOTRACE OFF
SQL> DROP INDEX PO_TBL_IDX;
Index dropped.
SQL> CREATE INDEX PO_TBL_IDX ON PO_TBL(PO_ID,MY_DATE,1);
Index created.
SQL> SET AUTOTRACE TRACE EXP
SQL> WITH T AS (
2 SELECT PO_ID,
3 MY_DATE,
4 COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
5 ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
6 FROM PO_TBL
7 )
8 SELECT PO_ID,
9 MY_DATE
10 FROM T
11 WHERE CNT > 1
12 AND RN = 1
13 /
Execution Plan
----------------------------------------------------------
Plan hash value: 1314444089
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 288 | 1 (0)| 00:00:01 |
|* 1 | VIEW | | 6 | 288 | 1 (0)| 00:00:01 |
| 2 | WINDOW BUFFER | | 6 | 132 | 1 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN| PO_TBL_IDX | 6 | 132 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CNT">1 AND "RN"=1)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SQL>
SY.
|
|
|
|
Goto Forum:
Current Time: Thu Mar 30 22:49:22 CDT 2023
|