Showing posts with label Oracle. Show all posts
Showing posts with label Oracle. Show all posts

Friday, March 4, 2011

tidbit:Oracle:ITL Deadlock troubleshooting and analysis

How to identify ITL deadlock:

1. Pick slot number (0x40021) at the top

2. Search for it in a deadlock trace file

3. See if wait event is ITL related.

Resolution: Increase INI_TRANS on table or on index.


                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00040021-00001409        66    2515     X             65    1253           S
TX-0005001b-000014d8        65    1253     X             66    2515           S


    Dumping one waiter:
      inst: 1, sid: 2513, ser: 41
      wait event: 'enq: TX - allocate ITL entry'
        p1: 'name|mode'=0x54580004
        p2: 'usn<<16 | slot'=0x40021
        p3: 'sequence'=0x1409
      row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
      min_blocked_time: 0 secs, waiter_cache_ver: 31817

Sunday, February 20, 2011

Books:Book about Oracle Founder Larry Ellison


This book absolutely must read! If you are DBA or start up founder - this book opens really shocking insight how business was and is made in Silicon Valley.

Have you knew that Larry Ellison was not employee #1 of Oracle?

Have you knew that Oracle put on IPO papers that it has mainframe solution while it was not even functional?

Have you knew how exectly Oracle crush the competitors and grew sales?

Read the book!

Tidbit:Oracle:Exadata:Flash Disks Information

Use below commands to get information on Exadata Cell Flash Disks

CellCLI> list celldisk where diskType = FlashDisk

         FD_00_cell      normal
         FD_01_cell      normal
         FD_02_cell      normal
         FD_03_cell      normal

CellCLI> list celldisk where diskType = FlashDisk DETAIL

         name:                   FD_00_cell
         comment:
         creationTime:           2010-10-21T14:52:28-07:00
         deviceName:             /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH01
         devicePartition:        /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH01
         diskType:               FlashDisk
         errorCount:             0
         freeSpace:              0
         id:                     0000012b-d0c7-42f2-0000-000000000000
         interleaving:           none
         lun:                    /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH01
         raidLevel:              "RAID 0"
         size:                   496M
         status:                 normal


         name:                   FD_01_cell
         comment:
         creationTime:           2010-10-21T14:52:30-07:00
         deviceName:             /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH02
         devicePartition:        /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH02
         diskType:               FlashDisk
         errorCount:             0
         freeSpace:              0
         id:                     0000012b-d0c7-61d3-0000-000000000000
         interleaving:           none
         lun:                    /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH02
         raidLevel:              "RAID 0"
         size:                   496M
         status:                 normal


         name:                   FD_02_cell
         comment:
         creationTime:           2010-10-21T14:52:31-07:00
         deviceName:             /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH03
         devicePartition:        /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH03
         diskType:               FlashDisk
         errorCount:             0
         freeSpace:              0
         id:                     0000012b-d0c7-695e-0000-000000000000
         interleaving:           none
         lun:                    /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH03
         raidLevel:              "RAID 0"
         size:                   496M
         status:                 normal


         name:                   FD_03_cell
         comment:
         creationTime:           2010-10-21T14:52:32-07:00
         deviceName:             /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH04
         devicePartition:        /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH04
         diskType:               FlashDisk
         errorCount:             0
         freeSpace:              0
         id:                     0000012b-d0c7-6cdc-0000-000000000000
         interleaving:           none
         lun:                    /opt/oracle/cell11.2.2.1.0_LINUX_101005/disks/raw/FLASH04
         raidLevel:              "RAID 0"
         size:                   496M
         status:                 normal

Thursday, February 17, 2011

Books: Exadata Books Coming!

Looks like couple oracle exadata books are in cooking already!

Achieving Extreme Performance with Oracle Exadata (Osborne ORACLE Press Series)


Expert Oracle Exadata



              

Exadata Performance Monitoring Part 1

CELLCLI LIST command can be used to monitor and monitor cell performance metrics

To receive some help on specific performance metric, use

list metricdefinition <metric_name> DETAIL

For Example:

CellCLI> list metricdefinition Io_LOAD DETAIL
         name:                   IO_LOAD
         description:            "Average I/O load for hard disks"
         metricType:             Instantaneous
         objectType:             CELL
         unit:                   Number


To display specific metric value, use:

CellCLI> LIST METRICCURRENT CL_CPUT DETAIL
         name:                   CL_CPUT
         alertState:             normal
         collectionTime:         2011-02-17T00:39:32-08:00
         metricObjectName:       cell
         metricType:             Instantaneous
         metricValue:            30.6 %
         objectType:             CELL

LIST command also accepts where clause (attribute filters):

To Show all immediate non-zero performance metric values:

LIST METRICCURRENT where metricValue > 0

CD_IO_TM_R_LG   CD_disk02_cell                                  36,321 us
CD_IO_TM_R_SM   CD_disk01_cell                                  1,347,527 us
CD_IO_TM_R_SM   CD_disk02_cell                                  1,151,048 us
CD_IO_TM_R_SM   CD_disk03_cell                                  931,721 us
CD_IO_TM_R_SM   CD_disk04_cell                                  2,286,866 us
CD_IO_TM_R_SM   CD_disk05_cell                                  958,574 us


To show historical data, use LIST METRICHISTORY Command:

You can use metricobjectname to show specific disk and collectiontime to show specific timeframe.

Make sure to filter data as default "list metrichistory" would provide too much data to comprehend

CellCLI> list metrichistory CD_IO_RQ_W_SM where metricobjectname = CD_disk01_cell and collectiontime > '2011-02-17T00:53:01-08:00' and metricvalue > 0

CD_IO_RQ_W_SM   CD_disk01_cell  350,071 IO requests  2011-02-17T00:53:19-08:00
CD_IO_RQ_W_SM   CD_disk01_cell  356,091 IO requests  2011-02-17T00:54:19-08:00
CD_IO_RQ_W_SM   CD_disk01_cell  358,111 IO requests  2011-02-17T00:55:19-08:00
CD_IO_RQ_W_SM   CD_disk01_cell  381,131 IO requests  2011-02-17T00:56:19-08:00
CD_IO_RQ_W_SM   CD_disk01_cell  454,151 IO requests  2011-02-17T00:57:19-08:00
CD_IO_RQ_W_SM   CD_disk01_cell  470,171 IO requests  2011-02-17T00:58:19-08:00
CD_IO_RQ_W_SM   CD_disk01_cell  640,191 IO requests  2011-02-17T00:59:20-08:00
CD_IO_RQ_W_SM   CD_disk01_cell  850,211 IO requests  2011-02-17T01:00:20-08:00

Sunday, December 5, 2010

Oracle: Internal structure of "date" data type

Did some investigation on internal date data format:

A date in binary format contains seven bytes, as shown in table below:

BYTE          1    2     3   4    5      6      7
Meaning Century Year Month Day Hour Minute Second
Example     120  110    11  30   16     18      1


The century and year bytes are in an excess–100 notation.

Dates Before Common Era (BCE) are less than 100.

The era begins on 01–JAN–4712 BCE, which is Julian day 1.

For this date, the century byte is 53, and the year byte is 88.

The hour, minute, and second bytes are in excess–1 notation.

The hour byte ranges from 1 to 24, the minute and second bytes from 1 to 60.

If no time was specified when the date was created, the time defaults to midnight (1, 1, 1).

You can use DUMP function to investigate this:

 SQL> create table t ( x date );

Table created.

SQL> insert into t values ( to_date( '03-nov-2010 3:17:01pm', 'dd-mon-yyyy hh:mi:sspm' ) );
1 row created.
SQL> commit; 


SQL> select dump(x) from t;
DUMP(X)
--------------------------------------------------------------------------------
Typ=12 Len=7: 120,110,11,3,16,18,2

Friday, November 26, 2010

tidbit:Oracle:Exadata Wait Events (cell related)

Exadata Wait Events:

NAME                                                             WAIT_CLASS
---------------------------------------------------------------- --------------------------------------------------


cell list of blocks physical read                                User I/O
cell manager cancel work request                                 Other
cell manager closing cell                                        System I/O
cell manager discovering disks                                   System I/O
cell manager opening cell                                        System I/O
cell multiblock physical read                                    User I/O
cell single block physical read                                  User I/O
cell smart file creation                                         User I/O
cell smart flash unkeep                                          Other
cell smart incremental backup                                    System I/O
cell smart index scan                                            User I/O
cell smart restore from backup                                   System I/O
cell smart table scan                                            User I/O
cell statistics gather                                           User I/O
cell worker idle                                                 Idle
cell worker online completion                                    Other
cell worker retry                                                Other

tidbit:Oracle:Exadata System Statistics (cell related)

Exadata cell statistics:

STATISTIC# NAME                                                                  CLASS
---------- ---------------------------------------------------------------- ----------
        51 cell physical IO interconnect bytes                                      64
       224 cell physical IO bytes saved during optimized file creation              64
       225 cell physical IO bytes saved during optimized RMAN file restore          64
       226 cell physical IO bytes eligible for predicate offload                    64
       227 cell physical IO bytes saved by storage index                             8
       228 cell smart IO session cache lookups                                      64
       229 cell smart IO session cache hits                                         64
       230 cell smart IO session cache soft misses                                  64
       231 cell smart IO session cache hard misses                                  64
       232 cell smart IO session cache hwm                                          64
       233 cell num smart IO sessions in rdbms block IO due to user                 64
       234 cell num smart IO sessions in rdbms block IO due to big payload          64
       235 cell num smart IO sessions using passthru mode due to user               64
       236 cell num smart IO sessions using passthru mode due to cellsrv            64
       237 cell num smart IO sessions using passthru mode due to timezone           64
       238 cell num smart file creation sessions using rdbms block IO mode          64
       239 cell physical IO interconnect bytes returned by smart scan               64
       240 cell session smart scan efficiency                                       64
       359 cell scans                                                               64
       360 cell blocks processed by cache layer                                    128
       361 cell blocks processed by txn layer                                      128
       362 cell blocks processed by data layer                                     128
       363 cell blocks processed by index layer                                    128
       364 cell commit cache queries                                                64
       365 cell transactions found in commit cache                                  64
       366 cell blocks helped by commit cache                                       64
       367 cell blocks helped by minscn optimization                                64
       368 cell blocks skipped due to chained rows                                  64
       369 cell simulated physical IO bytes eligible for predicate offload         192
       370 cell simulated physical IO bytes returned by predicate offload          192
       371 cell simulated session smart scan efficiency                            192
       372 cell CUs sent uncompressed                                               64
       373 cell CUs sent compressed                                                 64
       374 cell CUs sent head piece                                                 64
       375 cell CUs processed for uncompressed                                      64
       376 cell CUs processed for compressed                                        64
       377 cell IO uncompressed bytes                                               64
       394 cell index scans                                                         64
       571 cell flash cache read hits                                                8

Oracle:Windows:Database Upgrade Fall-back

Recently I was approached by one of the major customers to assist in creation of database fall-back procedure for a Windows based Oracle 10g RAC system, that is using OCFS to store database data.

The objective was to upgrade clusterware and database from Oracle 10.2.0.2 to 10.2.0.4 and on top of it apply patch 38. Obviously, both ORACLE_HOMES and database itself were to be upgraded.

In a large company world, I would suggest to create separate RAC cluster, setup standby database there and during the brief outage copy control files and redo logs, start-up database and be done with it. This way the fall-back is very simple (start using old system) and most of the work (installing and patching software) can be done without any production impact.

The problem was that customer engaged us only after their fall-back test totally failed and all outage process was based on actually upgrading production system (software and database).

The major issue was that Oracle provided fall-back failed, as downgrade did not worked as expected, as well as nothing was said about proper downgrade of ORACLE_HOMES.

At the end, we arrived to below fall-back procedure:

1) Shutdown all database related Windows services
2) Shutdown clusterware related Windows services
3) Kill ons.exe processes - they sometimes left hanging
4) Take system state backup (IMPORTANT)
5) Backup ocfs and oraclefence drivers
6) Backup database and clusterware Oracle homes
7) Backup Oracle Inventory
8) Backup control files and redo logs
9) Backup OCR using ocrconfig -export <exp_ocr_file.bkp>
10) Backup Voting disk using 'ocopy <voting_file> <exp_voting_file.bkp>'
11) Backup config files (init/tnsnames/listener/sqlnet) (just in case)

During fall-back exercise, we were able to:

1) Restore system state backup in 3 minutes
2) Restore drivers in 3 min (manual copy)
3) Oracle Homes and Inventory Restore was done using rename (1 minute)
4) OCR was restored using "ocrconfig -import" (3 minutes), no need to restore voting disk
5) Database restore from RMAN took 40-50 minutes
6) Redo restore (50GB) took 15 minutes in parallel with database restore
7) Control file restore took 1 minute
8) Database recovery and startup took 5 minutes

Fall-back was 100% successful

Thursday, November 25, 2010

Re: Exadata Oracle performance troubleshooting

Just stumbled upon a great article related to Oracle problem troubleshooting on Exadata:

Issue is related to "TABLE ACCESS STORAGE FULL" in exec plan (smart scans) and tables with the very large number of columns, which is typical for any serious production environment.

See below:

http://tech.e2sn.com/oracle/exadata/performance-troubleshooting/exadata-smart-scan-performance

Oracle 11G: Reading data directly from oracle instance memory using "oradebug direct_access"

Use Oradebug direct_access to directly read data from x$tables

SQL> oradebug help direct_access

DIRECT_ACCESS  <set/enable/disable command | select query> Fixed table access

oradebug direct_access [enable/disable] trace - to start/stop writing data to trace file

Select Sample - getting information about library cache:

SQL> oradebug setmypid
Statement processed.


SQL> oradebug direct_access SELECT * FROM X$KSMLRU
ADDR    = D28F85C
INDX    = 0
INST_ID = 1
KSMLRIDX        = 1
KSMLRDUR        = 0
KSMLRSHRPOOL    = 1
KSMLRCOM        =
KSMLRSIZ        = 0
KSMLRNUM        = 0
KSMLRHON        =
KSMLROHV        = 0
KSMLRSES        = 0
KSMLRADU        = 0
KSMLRNID        = 0
KSMLRNSD        = 0
KSMLRNCD        = 0
KSMLRNED        = 0


"Where" clause is not supported:

SQL> oradebug direct_access SELECT * FROM X$KSMLRU WHERE KSMLRSIZ > 0
ORA-00933: SQL command not properly ended

Tuesday, November 23, 2010

Great Oracle Internals book

Oracle8i Internal Services for Waits, Latches, Locks, and Memory
Out of number of great "Oracle internals" book this definitely stands out. Ignore 8i - wealth of information in this book deserves it to be read and reread.

I met author in 2001-2002 and he is a shy and very smart DBA from Australia. His website http://www.ixora.com.au/ is a treasure of Oracle internals information and should be studied and used by every reasonable good Oracle DBA.

This Book explain in a great depth internal work of different latches and locks - knowledge which DBA must have to resolve and identify root causes of very complex contentions.

Sunday, November 21, 2010

Technical areas I will write about

I was lucky to start working in database technology since very early - early versions of Oracle (v6), DB2 (2.1 on mainframe), SQL Server (v 6), MySQL (v3), Sybase and Teradata.

Arrival of scalable  NOSQL databases and a cloud puts a new life into database technology - what we can do with such databases as Hadoop, Simple DB, MongoDb, Cassandra opens new horizons!

I also start working with Internet technologies very early - sill remember using alpha version of Netscape over PPP and slirp. Was a beta user of Java (still holding SunSoft developer box with the first version of java) and prior to leaving to US (from Israel, mid 1995) was approached by Israeli SUN subsidiary to lead JDBC development (to create JDBC that was not existent at that time).

Using database and java technology while working with large insurance companies (Principal Financial Group, Allied Insurance, Nationwide Insurance), large Internet companies (eBay and Yahoo), supporting major Banks (like SVB) and Casinos (Like WYNN) creates great perspective and knowledge of what is working and not working under very high production load, what architectures are scalable and what is not scalable and should be broken into scalable pieces ASAP and how to handle critical production problems in real time.  All this I believe is worth sharing.

I also accumulated huge library of technical tidbits that can be very useful to DBAs, Application Server Admins and well as to System Admins.

Recently I work with data masking technology and find it to be very interesting area to cover as well.

I am sure I'll add more later - but this looks like a good start