Truth be told, the original title of this post was going to be "How I lost an entire day debugging datapatch which I thought had totally borked my database when trying to apply patch 19.12" but after some reflection and a few gin and tonics I've calmed down enough to describe the situation with more objectivity . I also learned an important thing about the HOST command in SQL*Plus and SQLcl.
If you run the HOST command from SQL*Plus, then we jump out to a standard command prompt window and you can do all all the normal command prompt type of things, eg
SQL> host Microsoft Windows [Version 10.0.19043.1237] (c) Microsoft Corporation. All rights reserved. C:\oracle>type x:\tmp\file.txt Hello World C:\oracle>
Typing exit brings you back into SQL*Plus and you can continue on your way with SQL commands. Of course, when we are running a script, we typically do not want to launch HOST and then just sit there waiting, hence the more normal usage is to run the host commands we wanted directly from within SQL*Plus on the same line, for example
SQL> host type x:\tmp\file.txt Hello World SQL> host echo Hello World Hello World SQL>
I choose the "type" and "echo" commands here to reveal a little more about the Windows command processor. If you go hunting for these programs on your Windows installation, you won't find any such program
C:\Windows>dir /s /b type.exe File Not Found C:\Windows>dir /s /b echo.exe File Not Found
This is because there is a suite of commands that "just work" because they are built into the Windows command program (CMD.EXE) itself. In the same way that within the common Unix shells like Bash, Korn and C shell, there are numerous commands you can use which are not programs in their own right, Windows is the same. Because you are within the CMD.EXE program, these built-in commands will take precedence over programs that can be located by searching through the Windows path. For example, if I create a program "type.exe" in a folder that is present on my PATH which is in reality just a clone of the web scraping program WGET:
C:\oracle>copy c:\utils\wget.exe c:\windows\system32\type.exe 1 file(s) copied.
then it doesn't really matter because the command processor has a TYPE command built-in already, and that always takes precedence:
C:\oracle>type x:\tmp\file.txt Hello World
Similarly, when I run a HOST from SQL*Plus, the built-ins all work as I'd expect even though my "fake" TYPE.EXE program as a clone of WGET is sitting there on the PATH.
SQL> host Microsoft Windows [Version 10.0.19043.1237] (c) Microsoft Corporation. All rights reserved. C:\oracle>type x:\tmp\file.txt Hello World C:\oracle>
But here's an interesting quirk with SQL*Plus. Running a HOST command with additional commands on the same line has a different behaviour. In that circumstance, the PATH is searched first before the built-ins come into play. With my WGET.EXE cloned to TYPE.EXE, look what happens when I run a TYPE command directly from a HOST command in SQL*Plus
SQL> host type x:\tmp\file.txt --2021-10-07 11:46:03-- ftp://x/%5Ctmp%5Cfile.txt => '%5Ctmp%5Cfile.txt' Resolving x (x)... failed: No such host is known. . type: unable to resolve host address 'x'
You can see it ran WGET.EXE which naturally could not resolve a website called "x:\tmp\file.txt"
At this point you're thinking "Well, what kind of doofus copies the WGET.EXE command to TYPE.EXE?" and you'd be quite correct in thinking that this is silly. However, if you had a different existing program called TYPE.EXE, for example, one of the GNU windows compatibility utilities that will tell you whether a file is binary or ascii, then that is a much more reasonable program to have, and that is exactly the situation on my PC.
And as I discovered, man oh man does the existence of a modified TYPE.EXE command in your PATH wreak havoc to any attempt to patch your database.
One of things that datapatch needs to do when it starts is ensure that it can successfully run SQL*Plus on your database before commencing the patching. In order to do this, it will create a SQL*Plus script file with a file name that is suffixed as "exec_DB_script.sql" in a temporary folder. The content of that file looks like below:
set newpage 1 set pagesize 14 connect / AS sysdba set echo off set heading off select 'C:A:T:C:O:N' || status||'##'||instance_name from v$instance / host type nul > [prefix]_exec_DB_script.done exit
The perl program that initiates the SQL*Plus session, runs the script above and then waits for the existence of the file "[prefix]_exec_DB_script.done" to know that SQL*Plus can be successfully executed. But as we have just seen, running HOST TYPE looks for TYPE.EXE in the Windows PATH before using the standard built-in. In my case, the "[prefix]_exec_DB_script.done" never got created because the TYPE command was invalid. Datapatch then hangs - you will see this in your patching session:
Connecting to database ..... OK Gathering datbase info .....
and that's it. The consequence of that (for me) was then trying to debug perl program datapatch.pl , which led to sqlpatch.pl, which led to sqlpatch.pm, which finally led to catcon.pm, an 18,000 line perl behemoth! Adding to the fun was that if you kill datapatch with a Ctrl-C when it hangs, it cleans up a lot of the temporary files and hence the sql script above that reveals the cause of the problem is no longer present!
After a lot of trial and error, and some tutorials on perl debugging, I eventually stumbled onto this part of the code:
# wait until the process running SQL statements terminates # select (undef, undef, undef, 0.01) until (-e $DoneFile);
where $DoneFile is the "[prefix]_exec_DB_script.done" file I mentioned previously. We are endlessly looping around until we see the file come into existence, and since (due to my altered TYPE.EXE command) it never arrives, datapatch hangs forever. There really isn't any way around this besides making sure that you do not have any potential clashes with Windows built-in command names as programs on your PATH, because you will only observe this behaviour with HOST [command] calls from SQL*Plus. SQLcl does not suffer the same issue, but in the case of datapatch, you don't get control on whether you are running SQLcl or SQL*Plus - that decision is controlled by datapatch.
I think a better way to handle creating a file in SQL*Plus would have been to do:
spool [prefix]_exec_DB_script.done spool off
which is platform independent and will not rely on any HOST/OS commands. I'll be raising an enhancement request to get that included in catcon.pm in a future release. Note that also impacts any standard "catcon" scripts you might want to run across multiple pluggables, because datapatch is just leveraging that existing functionality.
Datapatch aside, it is worth always keeping in mind that HOST [command] in SQL*Plus is not exactly the same as launching a new command window and running DOS commands. Hope this blog post will not cause you to lose a day of debugging like I did