Monday, May 19, 2014

Oracle RAC Interview Questions part 3

Oracle DBA Interview Questions and Answers - RAC
Oracle RAC Interview Questions and Answers 

How does OCSSD starts first if voting disk & OCR resides in ASM Diskgroups?
You might wonder how CSSD, which is required to start the clustered ASM instance, can be started if voting disks are stored in ASM?
This sounds like a chicken-and-egg problem:
without access to the voting disks there is no CSS, hence the node cannot join the cluster.
But without being part of the cluster, CSSD cannot start the ASM instance.
To solve this problem the ASM disk headers have new metadata in 11.2:
you can use kfed to read the header of an ASM disk containing a voting disk.
The kfdhdb.vfstart and kfdhdb.vfend fields tell CSS where to find the voting file. This does not require the ASM instance to be up.
Once the voting disks are located, CSS can access them and joins the cluster.

What is gsdctl in RAC? list gsdctl commands in Oracle RAC?
GSDCTL stands for Global Service Daemon Control, we can use gsdctl commands to start, stop, and obtain the status of the GSD service on any platform.

The options for gsdctl are:-
$ gsdctl start -- To start the GSD service
$ gsdctl stop  -- To stop the GSD service
$ gsdctl stat  -- To obtain the status of the GSD service

Log file location for gsdctl:
$ ORACLE_HOME/srvm/log/gsdaemon_node_name.log

What is RAC?
RAC stands for Real Application cluster.
It is a clustering solution from Oracle Corporation that ensures high availability of databases by providing instance failover, media failover features.
Oracle RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide a highly scalable and available database solution for all the business applications.
Oracle RAC provides the foundation for enterprise grid computing.

What is Oracle RAC One Node?
Oracle RAC one Node is a single instance running on one node of the cluster while the 2nd node is in cold standby mode. If the instance fails for some reason then RAC one node detect it and restart the instance on the same node or the instance is relocate to the 2nd node incase there is failure or fault in 1st node. The benefit of this feature is that it provides a cold failover solution and it automates the instance relocation without any downtime and does not need a manual intervention. Oracle introduced this feature with the release of 11gR2 (available with Enterprise Edition).

What is RAC and how is it different from non RAC databases?
Oracle Real Application clusters allows multiple instances to access a single database, the instances will be running on multiple nodes.
In Real Application Clusters environments, all nodes concurrently execute transactions against the same database.
Real Application Clusters coordinates each node's access to the shared data to provide consistency and integrity.

What are the advantages of RAC (Real Application Clusters)?
Reliability - if one node fails, the database won't fail
Availability - nodes can be added or replaced without having to shutdown the database
Scalability - more nodes can be added to the cluster as the workload increases

What is Oracle RAC One Node?
Oracle RAC one Node is a single instance running on one node of the cluster while the 2nd node is in cold standby mode. If the instance fails for some reason then RAC one node detect it and restart the instance on the same node or the instance is relocate to the 2nd node incase there is failure or fault in 1st node. The benefit of this feature is that it provides a cold failover solution and it automates the instance relocation without any downtime and does not need a manual intervention. Oracle introduced this feature with the release of 11gR2 (available with Enterprise Edition).

What is Cache Fusion?
Oracle RAC is composed of two or more instances. When a block of data is read from datafile by an instance within the cluster and another instance is in need of the same block, it is easy to get the block image from the instance which has the block in its SGA rather than reading from the disk. To enable inter instance communication Oracle RAC makes use of interconnects. The Global Enqueue Service (GES) monitors and Instance enqueue process manages the cache fusion.

What command would you use to check the availability of the RAC system?
crs_stat -t -v (-t -v are optional)

How do we verify that RAC instances are running?
SQL>select * from V$ACTIVE_INSTANCES;
The query gives the instance number under INST_NUMBER column,host_:instancename under INST_NAME column.

How can you connect to a specific node in a RAC environment?
tnsnames.ora ensure that you have INSTANCE_NAME specified in it.

Which is the "MASTER NODE" in RAC?
The node with the lowest node number will become master node and dynamic remastering of the resources will take place.
To find out the master node for particular resource, you can query v$ges_resource for MASTER_NODE column.
To find out which is the master node, you can see ocssd.log file and search for "master node number".
when the first master node fails in the cluster the lowest node number will become master node.

What components in RAC must reside in shared storage?
All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shred storage.

Give few examples for solutions that support cluster storage?
·ASM (automatic storage management),
·Raw disk devices,
·Network file system (NFS),
·OCFS2 and
·OCFS (Oracle Cluster Fie systems).

What are Oracle Cluster Components?
1.Cluster Interconnect (HAIP)
2.Shared Storage (OCR/Voting Disk)
3.Clusterware software
4.Oracle Kernel Components

What are Oracle RAC Components?
VIP, Node apps etc.

What are Oracle Kernel Components?
Basically Oracle kernel need to switched on with RAC On option when you convert to RAC, that is the difference as it facilitates few RAC bg process like LMON,LCK,LMD,LMS etc.

How to turn on RAC?
# link the oracle libraries
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_on
# rebuild oracle
$ cd $ORACLE_HOME/bin
$ relink oracle

Disk architechture in RAC?
SAN (Storage Area Networks) - generally using fibre to connect to the SAN
NAS (Network Attached Storage) - generally using a network to connect to the NAS using either NFS, ISCSI

What is Oracle Clusterware?
The Clusterware software allows nodes to communicate with each other and forms the cluster that makes the nodes work as a single logical server.
The software is run by the Cluster Ready Services (CRS) using the Oracle Cluster Registry (OCR) that records and maintains the cluster and node membership information and the voting disk which acts as a tiebreaker during communication failures. Consistent heartbeat information travels across the interconnect to the voting disk when the cluster is running.



Real Application Clusters
Oracle RAC is a cluster database with a shared cache architecture that overcomes the limitations of traditional shared-nothing and shared-disk approaches to provide a highly scalable and available database solution for all your business applications. Oracle RAC provides the foundation for enterprise grid computing.

Oracle’s Real Application Clusters (RAC) option supports the transparent deployment of a single database across a cluster of servers, providing fault tolerance from hardware failures or planned outages. Oracle RAC running on clusters provides Oracle’s highest level of capability in terms of availability, scalability, and low-cost computing.

One DB opened by multipe instances so the the db ll be Highly Available if an instance crashes.
Cluster Software. Oracles Clusterware or products like Veritas Volume Manager are required to provide the cluster support and allow each node to know which nodes belong to the cluster and are available and with Oracle Cluterware to know which nodes have failed and to eject then from the cluster, so that errors on that node can be cleared.

Oracle Clusterware has two key components Cluster Registry OCR and Voting Disk.

The cluster registry holds all information about nodes, instances, services and ASM storage if used, it also contains state information ie they are available and up or similar.

The voting disk is used to determine if a node has failed, i.e. become separated from the majority. If a node is deemed to no longer belong to the majority then it is forcibly rebooted and will after the reboot add itself again the the surviving cluster nodes.

What are the Oracle Clusterware key components?
Oracle Clusterware has two key components Cluster Registry OCR and Voting Disk.

What is Voting Disk and OCR?
Voting Disk
Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.
A node must be able to access more than half of the voting disks at any time.
For example, if you have 3 voting disks configured, then a node must be able to access at least two of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster.

Oracle Cluster Registry (OCR)
The cluster registry holds all information about nodes, instances, services and ASM storage if used, it also contains state information ie they are available and up or similar.
The OCR must reside on shared disk that is accessible by all of the nodes in your cluster.

What are the administrative tasks involved with voting disk?
Following administrative tasks are performed with the voting disk :
1) Backing up voting disks
2) Recovering Voting disks
3) Adding voting disks
4) Deleting voting disks
5) Moving voting disks

Can you add voting disk online? Do you need voting disk backup?
Yes,  as per documentation, if you have multiple voting disk you can add online, but if you have only one voting disk , by that cluster will be down as its lost you just need to start crs in exclusive mode and add the votedisk using
crsctl add votedisk <path>

What is the Oracle Recommendation for backing up voting disk?
Oracle recommends us to use the dd command to backup the voting disk with a minimum block size of 4KB.

How do we backup voting disks?
1) Oracle recommends that you back up your voting disk after the initial cluster creation and after we complete any node addition or deletion procedures.
2) First, as root user, stop Oracle Clusterware (with the crsctl stop crs command) on all nodes. Then, determine the current voting disk by issuing the following command:
crsctl query votedisk css
3) Then, issue the dd or ocopy command to back up a voting disk, as appropriate.
Give the syntax of backing up voting disks:-
On Linux or UNIX systems:
dd if=voting_disk_name of=backup_file_name
where,
voting_disk_name is the name of the active voting disk
backup_file_name is the name of the file to which we want to back up the voting disk contents
On Windows systems, use the ocopy command:
copy voting_disk_name backup_file_name

How do we verify an existing current backup of OCR?
We can verify the current backup of OCR using the following command : ocrconfig -showbackup

You have lost OCR disk, what is your next step?
The cluster stack will be down due to the fact that cssd is unable to maintain the integrity, this is true in 10g, From 11gR2 onwards, the crsd stack will be down, the hasd still up and running. You can add the ocr back by restoring the automatic backup or import the manual backup,

What are the major RAC wait events?
In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs.The most common wait events related to this are gc cr request and gc buffer busy

GC CR request :the time it takes to retrieve the data from the remote cache
Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)

GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block.

What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?
This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.

How do you troubleshoot node reboot?
Please check metalink ...
Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.

Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.

What are Oracle Clusterware processes for 10g on Unix and Linux?
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd) —A background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

What are Oracle database background processes specific to RAC?
Oracle RAC is composed of two or more database instances. They are composed of Memory structures and background processes same as the single instance database.Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion.Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

What is GRD?
GRD stands for Global Resource Directory. The GES and GCS maintains records of the statuses of each datafile and each cahed block using global resource directory.This process is referred to as cache fusion and helps in data integrity.

What is ACMS?
ACMS stands for Atomic Controlfile Memory Service.In an Oracle RAC environment ACMS is an agent that ensures a distributed SGA memory update(ie)SGA updates are globally committed on success or globally aborted in event of a failure.

What is SCAN listener?
A scan listener is something that additional to node listener which listens the incoming db connection requests from the client which got through the scan IP, it got end points configured to node listener where it routes the db connection requests to particular node listener.

SCAN IP can be disabled if not required. However SCAN IP is mandatory during the RAC installation. Enabling/disabling SCAN IP is mostly used in oracle apps environment by the concurrent manager (kind of job scheduler in oracle apps).
Steps to disable the SCAN IP,
i.  Do not use SCAN IP at the client end.
ii. Stop scan listener
    srvctl stop scan_listener
iii.Stop scan
    srvctl stop scan (this will stop the scan vip's)
iv. Disable scan and disable scan listener
    srvctl disable scan

What are the different network components are in 10g RAC?
public, private, and vip components
Private interfaces is for intra node communication.
VIP is all about availability of application. When a node fails then the VIP component fail over to some other node, this is the reason that all applications should based on vip components means tns entries should have vip entry in the host list

What is an interconnect network?
An interconnect network is a private network that connects all of the servers in a cluster. The interconnect network uses a switch/multiple switches that only the nodes in the cluster can access.

What is the use of cluster interconnect?
Cluster interconnect is used by the Cache fusion for inter instance communication.

How can we configure the cluster interconnect?
· Configure User Datagram Protocol (UDP) on Gigabit Ethernet for cluster interconnects.
· On UNIX and Linux systems we use UDP and RDS (Reliable data socket) protocols to be used by Oracle Clusterware.
· Windows clusters use the TCP protocol.

What is the purpose of Private Interconnect?
Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.

What is a virtual IP address or VIP?
A virtual IP address or VIP is an alternate IP address that the client connections use instead of the standard public IP address. To configure VIP address, we need to reserve a spare IP address for each node, and the IP addresses must use the same subnet as the public network.

What is the use of VIP?
If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections.

Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

Give situations under which VIP address failover happens?
VIP addresses failover happens when the node on which the VIP address runs fails; all interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from the network.

What is the significance of VIP address failover?
When a VIP address failover happens, Clients that attempt to connect to the VIP address receive a rapid connection refused error .They don't have to wait for TCP connection timeout messages.

What is the use of a service in Oracle RAC environment?
Applications should use the services feature to connect to the Oracle database. Services enable us to define rules and characteristics to control how users and applications connect to database instances.

What are the characteristics controlled by Oracle services feature?
The characteristics include a unique name, workload balancing, failover options, and high availability.

What enables the load balancing of applications in RAC?
Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database.

What are the types of connection load-balancing?
Connection Workload management is one of the key aspects when you have RAC instances as you want to distribute the connections to specific nodes/instance or those have less load.
There are two types of connection load-balancing:
1.Client Side load balancing (also called as connect time load balancing)
2.Server side load balancing (also called as Listener connection load balancing)

What is the difference between server-side and client-side connection load balancing?
Client-side balancing happens at client side where load balancing is done using listener.In case of server-side load balancing listener uses a load-balancing advisory to redirect connections to the instance providing best service.

Client Side load balancing:- Oracle client side load balancing feature enables clients to randomize the connection requests among all the available listeners based on their load.

An tns entry that contains all nodes entries and use load_balance=on (default its on) will use the connect time load balancing or client side load balancing.

Sample Client Side TNS Entry:-

    finance =
    (DESCRIPTION =
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac2-vip)(PORT = 2042))
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac1-vip)(PORT = 2042))
         (ADDRESS = (PROTOCOL = TCP)(HOST = myrac3-vip)(PORT = 2042))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
         (SERVER = DEDICATED)
         (SERVICE_NAME = FINANCE) (FAILOVER=ON)
    (FAILOVER_MODE =  (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5))
    )
    )

Server side load balancing:- This improves the connection performance by balancing the number of active connections among multiple instances and dispatchers. In a single instance environment (shared servers), the listener selects the least dispatcher to handle the incoming client requests. In a rac environments, PMON is aware of all instances load and dispatchers , and depending on the load information PMON redirects the connection to the least loaded node.

In a RAC environment, *.remote_listener parameter which is a tns entry containing all nodes addresses need to set to enable the load balance advisory updates to PMON.

Sample Tns entry should be in an instances of RAC cluster,

    local_listener=LISTENER_MYRAC1
    remote_listener = LISTENERS_MYRACDB

What are the administrative tools used for Oracle RAC environments?
Oracle RAC cluster can be administered as a single image using the below
·       OEM (Enterprise Manager),
·       SQL*PLUS,
·       Server control (SRVCTL),
·       Cluster Verification Utility (CLUVFY),
·       DBCA,
·       NETCA

Name some Oracle Clusterware tools and their uses?
·OIFCFG - allocating and deallocating network interfaces.
·OCRCONFIG - Command-line tool for managing Oracle Cluster Registry.
·OCRDUMP - Identify the interconnect being used.
·CVU - Cluster verification utility to get status of CRS resources.

What is the difference between CRSCTL and SRVCTL?
crsctl manages clusterware-related operations:
    Starting and stopping Oracle Clusterware
    Enabling and disabling Oracle Clusterware daemons
    Registering cluster resources

srvctl manages Oracle resource–related operations:
    Starting and stopping database instances and services
    Also from 11gR2 manages the cluster resources like network,vip,disks etc

No comments:

Post a Comment