Patching Oracle Exalogic II

Patching Oracle Exalogic II

Published on: Category: Oracle

Patching Oracle Exalogic – updating the IB Gateway switches

In my last post I promised to write some more on the details of patching, using patch set update 4 (i.e. Januari 2012 patchset update 13113092) as an example. So let’s get started on patching the infrastructure, by looking at updates for the Infiniband Gateway switches.

I will demonstrate that these switches can be upgraded in a rolling fashion, without interrupting the network services (except for a few seconds) and keeping the Exalogic online while doing so!

1.2     Patching the Gateway switches
First thing to note is that patching of the infrastructure is done under user root, not weblogic. After unzipping the el_infrastructure_10022.zip file (see my previous post on patching) we find the following:

 

  1. [root@xxxxcn1 ~]# <strong>cd /u01/common/patches/todo/13113092/Infrastructure</strong>
  1. <strong>/1.0.0.2.2/
  2. </strong>[root@xxxxcn1 1.0.0.2.2]$ <strong>ls
  3. </strong>BaseImage  NM2-36p  NM2-GW  one-command  README.html  README.txt
  1. ZFS_Storage_7320

First thing to do when starting this is some careful preparation: by thoroughly checking the provided README.html file and also checking for additional information provided on My Oracle Support (MOS) like upgrade advisors, i.e. the “Exalogic January 2012 PSU Infrastructure Upgrade Guide [ID 1392684.1]”. Be sure to also check the “known issues” document for your PSU.
Then we do some version checking to see whether we need to apply a component update or not, since the patchset is cumulative it is possible that some of the updates have already been applied earlier.

The README.html file for the infrastructure part says:

If you are running either v1.0.0.0.0 or v1.0.0.1.0 of Exalogic Infrastructure, you must apply all the infrastructure patches/upgrades included in this PSU in the following order:

1.Exalogic Infrastructure

   a. InfiniBand Gateway Switch (NM2-GW)

   b. InfiniBand Switch 36 ( NM2-36p )

   c. ZFS Storage Appliance (ZFS_Storage_7320)

         i.   Q3.2

         ii.  ILOM on the storage head

         iii. Q3.3

   d. Base Image v1.0.0.2.2 (rolling update, node at a time)

2. Exalogic Configuration Utility (ECU, previously called one-command)

Summarizing, the order of patching is as follows: first the network switches, then the storage appliance, then the OS on the compute nodes. Since we have a quarter rack configuration, there is no MM2-36p switch installed so we don’t have to update it. We only have to update the two NM2-GW switches in our rack.

1.3     Checking current versions on the switches
Now, we first check the current software versions for the IB gateway switches. The README says the following:
This section contains instructions on upgrading NM2-GW InfiniBand Gateway switches in an Exalogic rack from version 1.1.2-3 (factory default on Exalogic X2-2 racks shipped with either v1.0.0.0.0 or v1.0.0.1.0 of the Exalogic Base Image) to version 1.3.2-1. 
After logging in as root, we can use the version command to check the software version:

  1. [root@xxxxgw<strong>1</strong> ~]# <strong>version
  2. </strong>SUN DCS gw version: <strong>1.3.2-1
  3. </strong>Build time: Feb 17 2011 10:02:40
  4. FPGA version: 0x33
  5. SP board info:
  6. Manufacturing Date: 2010.12.30
  7. Serial Number: "NCD600077"
  8. Hardware Revision: 0x0006
  9. Firmware Revision: 0x0000
  10. BIOS version: SUN0R100
  11. BIOS date: 06/22/2010
  1. [root@xxxxgw<strong>2</strong> ~]# <strong>version
  2. </strong>SUN DCS gw version: <strong>1.3.2-1
  3. </strong>Build time: Feb 17 2011 10:02:40
  4. FPGA version: 0x33
  5. SP board info:
  6. Manufacturing Date: 2010.12.31
  7. Serial Number: "NCD600233"
  8. Hardware Revision: 0x0006
  9. Firmware Revision: 0x0000
  10. BIOS version: SUN0R100
  11. BIOS date: 06/22/2010

As it turns out, this particular patchset update is not very suited for demonstration of updates for the Infiniband Gateway switches in our case, as we already arrived at the required patchlevel (1.3.2-1) by doing the october 2011 patchset 12825625. Instead, I will therefore take the upgrade to version 2.0.0.0.0 (patch 13795376) as an example here. For this update, the Infiniband Gateway switches have to be upgraded to SUN DCS version 2.0.4-1.

First we have to do a number of prerequisite checks, which I will not mention here (but which are important to best ensure the update goes through flawlessy). Then we perform the upgrade of the two gateway switches in a rolling fashion, so we don’t interrupt network services and users and applications kan keep working. We do this by first upgrading the switch that is not the active master switch. Let’s find out which of the two has this role:

  1. [root@xxxxgw1 ~]# <strong>getmaster
  2. </strong>Local SM enabled and running
  3. 20120117 10:03:08 Master SubnetManager on sm lid 27 sm guid 0x2128be561ac0a0
  1. : SUN IB QDR GW switch xxxxgw2
  1. [root@xxxxgw2 ~]# <strong>getmaster
  2. </strong>Local SM enabled and running
  3. 20120117 10:03:20 Master SubnetManager on sm lid 27 sm guid 0x2128be561ac0a0
  1. : SUN IB QDR GW switch xxxxgw2

OK, gateway number 2 (GW02) is the master switch at present. That means we should upgrade the GW01 switch first, have them switch roles and then upgrade GW02 to finish up.

1.4     Upgrading GW01
The README for the 2.0.0.0.0 upgrade states the following (very similar to the README for the jan 2012 PSU, but a little more elaborate). The patch file is loaded via FTP from the Exalogic storage, where we have set up an ftp user called patcher for this in advance.

To upgrade the secondary NM2-GW switches, complete the following steps:

1. Switch to the ILOM shell by running the spsh command on the command line:

    # spsh

    ->

2. Ensure that you have created the patches share in the ZFS storage appliance, and

    enabled the FTP service on the share with the permission for root access, as described in the top-level README file, which is included in the upgrade kit.

    Load the firmware upgrade package using the command:

      -> load -source ftp://root:<root_password>@<storage_host>//<path_to_NM2-GW_fw_ upgrade_binaries_on_patches_share>/sundcs_gw_repository_2.0.4_1.pkg

OK, easy enough, let’s do that:

  1. [root@xxxxgw1 ~]# <strong>spsh
  2. </strong>Oracle(R) Integrated Lights Out Manager
  3. Version ILOM 3.0 r47111
  4. Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
  5. -&gt; <strong>load -source ftp://patcher@</strong><strong>xxxxsn-priv</strong><strong>//export/</strong><strong>common/patches/todo/</strong>
  1. <strong>13795376/Infrastructure/2.0.0.0.0/NM2-GW/2.0.4-1</strong><strong>/</strong><strong>sundcs_gw_repository_2.0.4_1.pkg
  2. </strong><em><span style="text-decoration: underline;">Error: URL should specify IP Address. Hostname is not supported.
  3. </span></em>Firmware image update failed.
  4. load: Command Failed

Hmm, I guess we should use the IP address of the storage instead of it’s name. Also I found that we need to supply the password directly, so we try again, and then it goes through:

  1. -&gt; <strong>load -source ftp://patcher:mypassword@</strong><strong>&lt;ZFS storage VIP address&gt;</strong><strong>//export/</strong>
  1. <strong>common/patches/todo/13795376/Infrastructure/2.0.0.0.0/NM2-GW/2.0.4-1</strong><strong>/</strong>
  1. <strong>sundcs_gw_repository_2.0.4_1.pkg
  2. </strong>Downloading firmware image. This will take few minutes.
  3. NOTE: Firmware upgrade will upgrade firmware on SUN DCS gw Kontron module,
  4. I4 and BridgeX. Upgrade takes few minutes to complete.
  5. ILOM will enter a special mode to load new firmware. No other tasks
  6. should be performed in ILOM until the firmware upgrade is complete.
  7. Are you sure you want to load the specified file (y/n)? y
  8. Setting up environment for firmware upgrade. This will take few minutes.
  9. Starting SUN DCS gw FW update
  10. ==========================
  11. Performing operation: I4 A
  12. ==========================
  13. I4 fw upgrade from 7.3.0(INI:1) to 7.4.0(INI:1):
  14. Upgrade started...
  15. Upgrade completed.
  16. INFO: I4 fw upgrade from 7.3.0(INI:1) to 7.4.0(INI:1) succeeded
  17. ==========================
  18. Performing operation: BX A
  19. ==========================
  20. BX fw upgrade from 8.3.3166(INI:4) to 8.4.2740(INI:5):
  21. Upgrade started...
  22. Upgrade completed.
  23. INFO: BX fw upgrade from 8.3.3166(INI:4) to 8.4.2740(INI:5) succeeded
  24. ==========================
  25. Performing operation: BX B
  26. ==========================
  27. BX fw upgrade from 8.3.3166(INI:4) to 8.4.2740(INI:5):
  28. Upgrade started...
  29. Upgrade completed.
  30. INFO: BX fw upgrade from 8.3.3166(INI:4) to 8.4.2740(INI:5) succeeded
  31. ===========================
  32. Summary of Firmware update
  33. ===========================
  34. I4 status                :  FW UPDATE - SUCCESS
  35. I4 update succeeded on   :  A
  36. I4 already up-to-date on :  none
  37. I4 update failed on      :  none
  38. BX status                :  FW UPDATE - SUCCESS
  39. BX update succeeded on   :  A, B
  40. BX already up-to-date on :  none
  41. BX update failed on      :  none
  42. =========================================
  43. Performing operation: SUN DCS gw firmware update
  44. =========================================
  45. SUN DCS gw Kontron module fw upgrade from 1.3.2-1 to 2.0.4-1:
  46. Please reboot the system to enable firmware update of Kontron module. The download
  1. of the Kontron firmware image happens during reboot.
  2. After system reboot, Kontron FW update progress can be monitored in browser using
  1. URL [http://GWsystem] OR at OS command line prompt by using command [telnet GWsystem 1234]
  1. where GWsystem is the hostname or IP address of SUN DCS GW.
  2. Firmware update is complete.

OK that worked fine, now exit the service processor shell and reboot it:

  1. -&gt; <strong>exit
  2. </strong>[root@xxxxgw1 ~]# <strong>reboot -n
  3. </strong>Broadcast message from root (pts/0) (Tue Mar 20 10:55:25 2012):
  4. The system is going down for reboot NOW!
  5. [root@xxxxgw1 ~]# Connection to xxxxgw1.qualogy.com closed by remote host.
  6. Connection to xxxxgw1.qualogy.com closed.

Wait a bit for the GW02 switch to come back up, then log back in to verify it and check the version:

  1. % <strong>ssh root@xxxxgw1.qualogy.com
  2. </strong>root@xxxxgw1.qualogy.com's password:
  3. Last login: Tue Mar 20 09:22:49 2012 from 192.168.110.219
  4. FW upgrade completed successfully on Tue Mar 20 11:02:32 CET 2012.
  5. Please run the "fwverify" CLI command to verify the new image.
  6. This message will be cleared on next reboot.
  7. You are now logged in to the root shell.
  8. It is recommended to use ILOM shell instead of root shell.
  9. All usage should be restricted to documented commands and documented
  10. config files.
  11. To view the list of documented commands, use "help" at linux prompt.
  12. [root@xxxxgw1 ~]# <strong>fwverify
  13. </strong>Checking all present packages:
  14. ................................................................................
  1. .............................................................. OK
  2. Checking if any packages are missing:
  3. .................................................................................
  1. ........................................................ OK
  2. Verifying installed files:
  3. ..................................................................................
  1. .......................................................... OK
  2. [root@xxxxgw1 ~]# <strong>version
  3. </strong>SUN DCS gw version: <strong>2.0.4-1
  4. </strong>Build time: Oct 17 2011 10:04:07
  5. FPGA version: 0x33
  6. SP board info:
  7. Manufacturing Date: 2010.12.30
  8. Serial Number: "NCD600077"
  9. Hardware Revision: 0x0006
  10. Firmware Revision: 0x0000
  11. BIOS version: SUN0R100
  12. BIOS date: 06/22/2010

OK, done! There’s more checking to do but I’ll skip it here for both for clarity and brevity.

1.5     Switching network control from GW02 over to GW01
Now that we have succesfully upgraded GW01, we can now make it the master switch so that GW02 is freed from network control duty and can be upgraded as well. We can do this by temporarely disabling the subnet manager on GW02, forcing a switchover:

  1. [root@xxxxgw2 ~]# <strong>disablesm
  2. </strong>Stopping partitiond daemon.                                [  OK  ]
  3. Stopping IB Subnet Manager..-.                             [  OK  ]
  1. SUN IB QDR GW switch <strong>xxxxgw1</strong> 192.168.110.250
  2. [root@xxxxgw2 ~]# <strong>getmaster
  3. </strong>Local SM not enabled
  4. 20120320 10:47:39 Master SubnetManager on sm lid 12 sm guid 0x2128be529ac0a0 :
  1. SUN IB QDR GW switch <strong>xxxxgw1</strong> 192.168.110.250

1.6     Upgrading GW02
So now the GW01 has become the master switch and we can upgrade GW02 in the same way.  After completing the upgrade for GW02 and checking the version, we should make sure the subnet manager is re-enabled on GW02 so it can again watch GW01’s back and quickly takeover control if the need arises.

  1. [root@xxxxgw2 ~]# <strong>enablesm
  2. </strong>Starting IB Subnet Manager.                                [  OK  ]
  3. Starting partitiond daemon.                                [  OK  ]

Cool, we have in fact perfomed a rolling upgrade on the NM2-GW switches, and while we were upgrading them one after the other, the Exalogic stayed online!
Note: ususally there are some small post-upgrade steps to do which I will not mention here.

1.7     Next time
Next time, we will have a look at how the ZFS 7320 storage appliance kan be upgraded in a similar fashion, using the rolling upgrade principle.

Publicatiedatum: 18 juli 2012

Jos Nijhoff
About the author Jos Nijhoff

Jos Nijhoff is an experienced Application Infrastructure consultant at Qualogy. Currently he plays a key role as technical presales and hands-on implementation lead for Qualogy's exclusive Exalogic partnership with Oracle for the Benelux area. Thus he keeps in close contact with Oracle presales and partner services on new developments, but maintains an independent view. He gives technical guidance and designs, reviews, manages and updates the application infrastructure before, during and after the rollout of new and existing Oracle (Fusion) Applications & Fusion Middleware implementations. Jos is also familiar with subjects like high availability, disaster recovery scenarios, virtualization, performance analysis, data security, and identity management integration with respect to Oracle applications.

More posts by Jos Nijhoff
Comments
Reply