1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Re: Matlab 2020b crashes with labCA 3.7.2 |
From: | Till Straumann via Tech-talk <tech-talk at aps.anl.gov> |
To: | Miroslaw Dach <mdach at lbl.gov> |
Cc: | Gregory Portmann <gjportmann at lbl.gov>, "Corbett, William J." <corbett at slac.stanford.edu>, EPICS Techtalk <tech-talk at aps.anl.gov> |
Date: | Fri, 5 Mar 2021 15:50:51 +0100 |
Hi Till,
Thank you very much for the in depth study of the problem. It looks like Mathworks has changed something in the code and even worse - they have introduced an "unwanted feature" which affects the Matlan2020b and LabCa users on RHEL7.
We will try one of your suggestions and let you know how things are.
Many ThanksMirek
On Mon, Mar 1, 2021 at 2:01 AM Till Straumann <till.straumann at psi.ch> wrote:
Hi Mirek.
I have investigated this deadlock and come to the conclusion that it is a problem
with matlab, probably under RHEL7 (I have no way to test under other systemes, in particular: windows,
though).
When I debugged the deadlock I found that some matlab threads deadlock in a library called
glibc-2.17_shim.so
This (mathworks proprietary) library is LD_PRELOADed from the 'matlab' driver script where we find a comment:
# Preload glibc_shim in case of RHLE7 variants
test -e /usr/bin/ldd && ldd --version | grep -q "(GNU libc) 2\.17" \
&& export LD_PRELOAD="$LD_PRELOAD:$MATLAB/bin/glnxa64/glibc-2.17_shim.so" \
&& export MW_GLIBC_SHIM="$MATLAB/bin/glnxa64/glibc-2.17_shim.so"
which leads to the hypothesis that RHEL7 only may be affected.
The deadlock happens when matlab
- loads a shared object (or library)
- AND the shared object executes some initialization code (e.g., constructors of global objects defined in the library)
- AND the initialization code calls 'pthread_join()'. 'pthread_join()' then never returns.
Note that if 'ordinary' code in the shared object (i.e., as opposed to initialization code) uses 'pthread_join()' then
that works fine.
A simple example mex file (attached) which is not using labca or epics and reproduces the described behaviour.
EPICS' libCom does use 'pthread_join()' during initialization and is therefore affected.
At this point I can suggest two possible work-arounds (using one of them is sufficient):
1.) Use an EPICS-base build with posix priority scheduling disabled. This avoids a section of initialization
code which calls 'pthread_join()'
E.g., in configure/CONFIG_SITE:
USE_POSIX_THREAD_PRIORITY_SCHEDULING = NO
2.) LD_PRELOAD EPICS' libCom.so *before* starting matlab
LD_PRELOAD=<path_to_my_epics_lib>/libCom.so matlab
HTH
- Till
On 2/19/21 4:20 AM, Miroslaw Dach wrote:
Hi Till,
We have crossed each other. You came to PSI from the US and I did the opposite. I moved to work in LBL.
Are you still maintaining the LabCa?We are facing a problem with Matlab 2020b crashes when using labCa 3.7.2.
It looks like the incompatibility between the Matlab 2020b and labCa latest official version.
The labCa 3.7.2 seems to be the latest version unless you have the newer one?
Best RegardsMirek
/* Demonstrate a problem with pthread_join() from initialization * code under matlab2020b * * For a more realistic example you can compile this file into * a library (mimicks an external library): * * g++ -DBUILD_AS_LIBRARY -shared -fPIC mexJoin.cc -o libXXX.so * * and a separate mex-file: * * mex -cxx -DBUILD_AS_MEXFILE mexJoin.cc -L. -lXXX * * Alternatively, the code can be compiled into a single mexfile: * * mex -cxx mexJoin.cc */ /* Use stdio for printing messages to ensure matlab is not interfering. * Must use matlab in CLI mode, however, in order to see the messages: * * matlab -nodisplay -nosplash -nojvm * * (or watch the terminal window from where the matlab GUI was started) */ /* Author: Till Straumann <till.straumann at psi.ch>, 2021 */ #include <string.h> #include <errno.h> #include <pthread.h> #include <stdlib.h> #include <stdio.h> #if ! defined(BUILD_AS_LIBRARY) && ! defined(BUILD_AS_MEXFILE) #define BUILD_AS_LIBRARY #define BUILD_AS_MEXFILE #endif extern "C" { int join_a_thread(int skip); }; #ifdef BUILD_AS_LIBRARY static void * some_thread(void *arg) { fprintf(stderr, "Thread terminated\n"); return 0; } extern "C" { int join_a_thread(int skip) { pthread_t id; switch ( skip ) { case 1: fprintf(stderr, "Skipping thread creation and joining during initialization phase\n\n"); return -1; case 2: fprintf(stderr, "Skipping thread creation and joining during finalization phase\n\n"); return -1; case -1: fprintf(stderr, "Attempting thread creation and joining during initialization phase\n"); break; case -2: fprintf(stderr, "Attempting thread creation and joining during finalization phase\n"); break; default: fprintf(stderr, "Attempting thread creation and joining from mexFunction\n"); break; } if ( pthread_create( &id, 0, some_thread, 0 ) ) { fprintf(stderr, "Unable to create thread\n\n"); return -1; } fprintf(stderr, "Created a thread\n"); // 2020b under RHEL7 (compiled with g++ 9.3.0) // deadlocks in 'pthread_join' -- somewhere in // MW's glibc-2.17_shim.so. The same happens // when code which attempts 'pthread_join' from // a global initializer is loaded with dlopen(). // // NOTE: 'join_a_thread()' works fine when executed // from the mex-function itself; the deadlock // occurs only when this is attempted during // library initialization! fprintf(stderr, "Attempting to join the thread\n"); if ( pthread_join( id, 0 ) ) { fprintf(stderr, "Unable to join thread\n\n"); return -1; } fprintf(stderr, "Successfully joined!\n\n"); return 0; } } // Set the environment variable 'SKIP_JOIN_DURING_INIT' (prior to starting // matlab) to verify that joining works when executed from the mexFunction // itself. class Initializer { public: Initializer() { join_a_thread( ( (!! getenv("SKIP_JOIN_DURING_INIT")) ? 1 : -1) ); } ~Initializer() { join_a_thread( ( (!! getenv("SKIP_JOIN_DURING_EXIT")) ? 2 : -2) ); } }; static Initializer v; #endif #ifdef BUILD_AS_MEXFILE #include <mex.hpp> #include <mexAdapter.hpp> using namespace matlab::data; using matlab::mex::ArgumentList; class MexFunction : public matlab::mex::Function { public: void operator()(matlab::mex::ArgumentList o, matlab::mex::ArgumentList i) { //mexLock(); -- could prevent unloading join_a_thread(0); } }; #endif