Jeff,
Thanks for the rapid response.
> It's also an interesting question why general time was unable to provide any source of time at this point during the boot?
I strongly suspect that this is related to a problem I discovered in 3.14.10, and which Andrew and I thought we had fixed. The problem was a race condition in the creation and use of the WIN32 general time server. It showed up on WIN32 IOCs on multi-core systems. Our fix cured the problem on the system we saw it on, but the new failure looks similar. That fix was made in version 1.38.2.8 of os/WIN32/osdTime.cpp.
> That change causes the code to return a time stamp based on an uninitialized auto (stack stored) structure. That would introduce undefined
> behavior which is probably not a good idea IMHO.
Right, that was just intended to be a quick and dirty fix to get past the initial crash, and demonstrate that it was just the very first call to epicsGetTimeCurrent that was failing and throwing the exception.
Mark
________________________________
From: Jeff Hill [mailto:[email protected]]
Sent: Mon 5/11/2009 4:29 PM
To: Mark Rivers; 'Andrew Johnson'; 'Eric Norum'
Cc: [email protected]
Subject: RE: Problem with WIN32
Mark,
That change causes the code to return a time stamp based on an uninitialized auto (stack stored) structure. That would introduce undefined behavior which is probably not a good idea IMHO.
It's quite normal for C++ codes to report serious failures such as this one by throwing an exception. There are of course two ways you can get into trouble.
1) If there isn't a try catch block when crossing a membrane between c code and c++ code trouble is almost guaranteed. I have been very careful about these membranes in R3.14.
2) A C++ code isn't properly introducing try/catch blocks dealing with exceptions. Its starting to look like this one is the cause.
I had a look at the stack trace, which was very helpful BTW, and I can see that we are currently fetching the time in the last chance exception handler for a C++ based thread when a 2nd exception is thrown. I don't know which thread received the precipitating exception, but it is a pretty good guess that the precipitating exception might have also have been caused by fetching the current time.
I think that two fixes need to occur.
1) We need to determine which thread caused the precipitating exception and upgrade its error handling - adding a try catch block at a strategic location. I am going to guess that the precipitating exception might have been thrown in the timer queue library.
2) The code below in the exception handler that fetches the time in order to print a diagnostic message needs a try catch block.
It's also an interesting question why general time was unable to provide any source of time at this point during the boot? On windows the original R3.14 had an excellent source of time created using high precision performance counter based time synchronized to the real time clock - a similar approach to that used by the perl community BTW. So it's less than clear why one wouldn't fall back to that time source (essentially to one that is available from the OS) in general time? FWIW, NTP uses cascaded PLLs to produce discontinuity proof transitions between different time sources. That might be a better approach compared to the abrupt changes I seem to recall occur when switching time sources in general time.
catch ( ... ) {
if ( ! waitRelease ) {
epicsTime cur = epicsTime::getCurrent (); ç================= here
char date[64];
cur.strftime ( date, sizeof ( date ), "%a %b %d %Y %H:%M:%S.%f");
char name [128];
epicsThreadGetName ( pThread->id, name, sizeof ( name ) );
errlogPrintf (
"epicsThread: Unknown C++ exception in thread \"%s\" at %s\n",
name, date );
errlogFlush ();
// this should behave as the C++ implementation intends when an
// exception isnt handled. If users dont like this behavior, they
// can install an application specific unexpected handler.
std::unexpected ();
}
}
Jeff
From: [email protected] [mailto:[email protected]] On Behalf Of Mark Rivers
Sent: Monday, May 11, 2009 2:39 PM
To: Andrew Johnson; Eric Norum
Cc: [email protected]
Subject: RE: Problem with WIN32
Folks,
I am quite sure this is a bug in base. I modified libCom/osi/epicsTime.cpp so it just prints an error, rather than throwing an exception, if epicsTimeGetCurrent() returns an error:
corvette:src/libCom/osi>cvs diff epicsTime.cpp
Index: epicsTime.cpp
===================================================================
RCS file: /net/phoebus/epicsmgr/cvsroot/epics/base/src/libCom/osi/epicsTime.cpp,v
retrieving revision 1.25.2.20
diff -u -r1.25.2.20 epicsTime.cpp
--- epicsTime.cpp 18 Apr 2008 18:39:19 -0000 1.25.2.20
+++ epicsTime.cpp 11 May 2009 20:30:01 -0000
@@ -192,7 +192,8 @@
epicsTimeStamp current;
int status = epicsTimeGetCurrent (¤t);
if (status) {
- throwWithLocation ( unableToFetchCurrentTime () );
+printf("epicsTime::getCurrent, unable to fetch current time\n");
+ //throwWithLocation ( unableToFetchCurrentTime () );
}
return epicsTime ( current );
}
With this change I observe that when the IOC starts up I see one of those error messages, and no more. I think we still have a timing problem where the generalTime system is not up and running on Windows before it is first being called. I have been using this patched version of 3.14.10 for a while with no problems, but my application recently got more complex, with more DLLs being loaded when the application starts. I believe that is slowing things down enough when the application starts that we are now seeing another problem. The version of osi/os/WIN32/osdTime.cpp that I am using is effectively 1.38.2.8, which Andrew and I worked on to fix similar problems in 3.14.10.
$ ../../bin/win32-x86/prosilicaApp.exe st.cmd.win32
epicsTime::getCurrent, unable to fetch current time
< envPaths.win32
epicsEnvSet("ARCH","win32-x86")
...
Mark
________________________________
From: Mark Rivers
Sent: Monday, May 11, 2009 2:51 PM
To: Andrew Johnson; 'Eric Norum'
Subject: Problem with WIN32
Folks,
I am getting a crash when I start the areaDetector IOC on win32-x86 and win32-x86-debug. It looks like it might be a problem in base, perhaps related to the bug Andrew and I previously fixed with the timer not being created before it was being used. Here is the trace. The problem is in the 7'th line from the bottom, copied here:
simDetectorApp.exe!throwExceptionWithLocation<epicsTime::unableToFetchCurrentTime>(const epicsTime::unableToFetchCurrentTime & parm={...}, const char * pFileName=0x00616620, unsigned int lineNo=195) Line 74 C++
This is happening right when the IOC starts up, even without an st.cmd file.
Mark
ntdll.dll!7c90eb94()
[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]
user32.dll!7e419418()
user32.dll!7e42dba8()
user32.dll!7e42593f()
user32.dll!7e43a91e()
> simDetectorApp.exe!_output_s_l(_iobuf * stream=0x00000000, const char * format=0x001449b0, localeinfo_struct * plocinfo=0x00144690, char * argptr=0x00012012) Line 1164 + 0x17 bytes C++
user32.dll!7e466278()
user32.dll!7e450617()
user32.dll!7e4505cf()
simDetectorApp.exe!__crtMessageBoxA(const char * lpText=0x003b8620, const char * lpCaption=0x0061fe50, unsigned int uType=73746) Line 145 C
simDetectorApp.exe!__crtMessageWindowA(int nRptType=1, const char * szFile=0x00000000, const char * szLine=0x00000000, const char * szModule=0x00000000, const char * szUserMessage=0x003b9694) Line 420 + 0x16 bytes C
simDetectorApp.exe!_VCrtDbgReportA(int nRptType=1, const char * szFile=0x00000000, int nLine=0, const char * szModule=0x00000000, const char * szFormat=0x0061e2c0, char * arglist=0x003be728) Line 417 + 0x28 bytes C
simDetectorApp.exe!_CrtDbgReportV(int nRptType=1, const char * szFile=0x00000000, int nLine=0, const char * szModule=0x00000000, const char * szFormat=0x0061e2c0, char * arglist=0x003be728) Line 300 + 0x1d bytes C
simDetectorApp.exe!_CrtDbgReport(int nRptType=1, const char * szFile=0x00000000, int nLine=0, const char * szModule=0x00000000, const char * szFormat=0x0061e2c0, ...) Line 317 + 0x1d bytes C
simDetectorApp.exe!_NMSG_WRITE(int rterrnum=10) Line 197 + 0x18 bytes C
simDetectorApp.exe!abort() Line 59 + 0x7 bytes C
simDetectorApp.exe!terminate() Line 136 C++
simDetectorApp.exe!__CxxUnhandledExceptionFilter(_EXCEPTION_POINTERS * pPtrs=0x003bf1cc) Line 72 C++
kernel32.dll!7c863016()
simDetectorApp.exe!_XcptFilter(unsigned long xcptnum=3765269347, _EXCEPTION_POINTERS * pxcptinfoptrs=0x003bf1cc) Line 237 + 0xa bytes C
simDetectorApp.exe!_callthreadstartex() Line 350 + 0x17 bytes C
simDetectorApp.exe!@_EH4_CallFilterFunc@8() + 0x12 bytes Asm
simDetectorApp.exe!_except_handler4(_EXCEPTION_RECORD * ExceptionRecord=0x003bf2c0, _EXCEPTION_REGISTRATION_RECORD * EstablisherFrame=0x003bff98, _CONTEXT * ContextRecord=0x003bf2e0, void * DispatcherContext=0x003bf294) + 0xb7 bytes C
ntdll.dll!7c9037bf()
ntdll.dll!7c90378b()
ntdll.dll!7c937860()
ntdll.dll!7c90eafa()
kernel32.dll!7c812a5b()
ntdll.dll!7c9106eb()
ntdll.dll!7c911538()
kernel32.dll!7c812a5b()
ntdll.dll!7c911596()
ntdll.dll!7c9106eb()
ntdll.dll!7c9106eb()
ntdll.dll!7c911538()
ntdll.dll!7c919a9c()
ntdll.dll!7c919b3f()
ntdll.dll!7c919aeb()
ntdll.dll!7c911538()
ntdll.dll!7c919aeb()
ntdll.dll!7c919d27()
ntdll.dll!7c919a9c()
ntdll.dll!7c919b3f()
ntdll.dll!7c919aeb()
kernel32.dll!7c812a5b()
simDetectorApp.exe!fetchWin32ThreadGlobal() Line 183 + 0x11 bytes C
simDetectorApp.exe!_CxxThrowException(void * pExceptionObject=0x003bf650, const _s__ThrowInfo * pThrowInfo=0x0063e0e0) Line 166 C++
simDetectorApp.exe!throwExceptionWithLocation<epicsTime::unableToFetchCurrentTime>(const epicsTime::unableToFetchCurrentTime & parm={...}, const char * pFileName=0x00616620, unsigned int lineNo=195) Line 74 C++
simDetectorApp.exe!epicsTime::getCurrent() Line 195 + 0x18 bytes C++
simDetectorApp.exe!epicsThreadCallEntryPoint(void * pPvt=0x003a96f4) Line 68 + 0xc bytes C++
simDetectorApp.exe!epicsWin32ThreadEntry(void * lpParameter=0x003a9a68) Line 498 + 0x11 bytes C
simDetectorApp.exe!_callthreadstartex() Line 348 + 0xf bytes C
simDetectorApp.exe!_threadstartex(void * ptd=0x003a9ac8) Line 331 C
kernel32.dll!7c80b683()
- References:
- RE: Problem with WIN32 Mark Rivers
- RE: Problem with WIN32 Jeff Hill
- Navigate by Date:
- Prev:
MEDM 3.1.4 + extensionsTop Bertrand H.J. Biritz
- Next:
Re: MEDM 3.1.4 + extensionsTop Jack
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: Problem with WIN32 Jeff Hill
- Next:
MEDM 3.1.4 + extensionsTop Bertrand H.J. Biritz
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|