FYI, I wrote PyArchAppl (https://github.com/archman/PyArchAppl) for similar purpose, and the retrieval tool: "pyarchappl-get" is sorted of well-tested with thousands of PVs at FRIB.
Thanks,
Tong
-----Original Message-----
From: Tech-talk <tech-talk-bounces at aps.anl.gov> On Behalf Of Michael Davidsaver via Tech-talk
Sent: Friday, March 18, 2022 9:12 PM
To: Manoussakis, Adamandios <manoussakis1 at llnl.gov>
Cc: Shankar, Murali <mshankar at slac.stanford.edu>; tech-talk at aps.anl.gov
Subject: Re: Epics Archiver Appliance and Network Storage slowdown
[EXTERNAL] This email originated from outside of FRIB
On 3/17/22 14:53, Manoussakis, Adamandios via Tech-talk wrote:
> I understand, seems like 3000+ sequential calls would probably not be ideal. Would it be easier to add multiple PV call to getData similar to how you can pass a list to getDataAtTime? Appreciate you answering all my questions, the AA has been great.
fyi. I have some CLI scripts for AA retrieval. When I originally
developed carchivetools I was able to achieve speeds compatible to
plain http retrieval (aka. bandwidth limited). Both will make
parallel requests up to a configurable limit.
carchivetools (cf. "arget") is the original. Well tested, but
showing its age.
https://github.com/mdavidsaver/carchivetools
aaclient is a rewrite (cf . "aaget") using more recent python
constructs. Probably easier to setup now, but not as well tested.
https://github.com/mdavidsaver/aaclient
> Thanks,
>
> Adam
>
> *From:* Shankar, Murali <mshankar at slac.stanford.edu>
> *Sent:* Thursday, March 17, 2022 2:48 PM
> *To:* Manoussakis, Adamandios <manoussakis1 at llnl.gov>; tech-talk at aps.anl.gov
> *Subject:* Re: Epics Archiver Appliance and Network Storage slowdown
>
>>> With getData could you not set the from/to at the same time and it would request at just that point in time though or would that not work similar to getDataAtTime?
>
> getData is still targeted at a single PV. So you could do this one PV at a time. Passing in multiple PV's is also still one PV at a time so the performance would be that of 3000 sequential calls to getData.
>
>>> Is there any way to use fetchLatestMetaData with getDataAtTime?
>
> Not yet. I think this might be a reasonable amount of work.
>
> Regards,
>
> Murali
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> *From:*Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>
> *Sent:* Thursday, March 17, 2022 2:38 PM
> *To:* Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov> <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* RE: Epics Archiver Appliance and Network Storage slowdown
>
> Thanks for the clarification Murali, I am using Hans repo from the setup section of the AA. I will see where the startup script resides and try out the adjustment to the parameter.
>
> Sorry just wanted to revisit these two questions as well:
>
> With getData could you not set the from/to at the same time and it would request at just that point in time though or would that not work similar to getDataAtTime?
>
> Is there any way to use fetchLatestMetaData with getDataAtTime?
>
> Thanks
>
> Adam
>
> *From:* Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>
> *Sent:* Wednesday, March 16, 2022 9:44 PM
> *To:* Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
> *Subject:* Re: Epics Archiver Appliance and Network Storage slowdown
>
> You'd usually have some copy of sampleStartup.sh that you'd be using. In this you'd be setting JAVA_OPTS; something like so
>
> export JAVA_OPTS="-XX:+UseG1GC -Xmx4G -Xms4G -ea"
>
> You can set the parameter here
>
> export JAVA_OPTS="-XX:+UseG1GC -Xmx4G -Xms4G -ea -Djava.util.concurrent.ForkJoinPool.common.parallelism=48"
>
> There's usually something like a start function
>
> function start() {
>
> startTomcatAtLocation ${ARCHAPPL_DEPLOY_DIR}/mgmt
>
> startTomcatAtLocation ${ARCHAPPL_DEPLOY_DIR}/engine
>
> startTomcatAtLocation ${ARCHAPPL_DEPLOY_DIR}/etl
>
> startTomcatAtLocation ${ARCHAPPL_DEPLOY_DIR}/retrieval
>
> }
>
> You can set the JAVA_OPTS for just the retrieval component like so...
>
> function start() {
>
> startTomcatAtLocation ${ARCHAPPL_DEPLOY_DIR}/mgmt
>
> startTomcatAtLocation ${ARCHAPPL_DEPLOY_DIR}/engine
>
> startTomcatAtLocation ${ARCHAPPL_DEPLOY_DIR}/etl
>
> export JAVA_OPTS="-XX:+UseG1GC -Xmx4G -Xms4G -ea -Djava.util.concurrent.ForkJoinPool.common.parallelism=48"
>
> startTomcatAtLocation ${ARCHAPPL_DEPLOY_DIR}/retrieval
>
> }
>
> Again, I don't know the specifics of your setup but something like this was what I used when testing it locally.
>
> Hope that helps.
>
> Regards,
>
> Murali
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> *From:*Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>
> *Sent:* Wednesday, March 16, 2022 6:26 PM
> *To:* Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov> <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* Re: Epics Archiver Appliance and Network Storage slowdown
>
>>> What file does the JVM parameter live in that I can change?
>
> Just whatever startup file you use.. This is just another JVM parameter.
>
> Regards,
>
> Murali
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> *From:*Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>
> *Sent:* Wednesday, March 16, 2022 5:50 PM
> *To:* Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov> <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* RE: Epics Archiver Appliance and Network Storage slowdown
>
> I am looking into the NFS mounting and if there is anything I can do to optimize performance. Currently setup a synology 4bay DSM in Synologys version of Raid 1 with btfrs. The AA is running on a VM with Ubuntu but havnt really had many network issues with the requests going out.
>
> What file does the JVM parameter live in that I can change?
>
> With getData could you not set the from/to at the same time and it would request at just that point in time though?
>
> I will try to break down the requests and let each request finish before sending out the next chunk, maybe the parallel nature is just bogging down the filesystem?
>
> Side Question: I had posted in the issues for the github, but is there any way to use fetchLatestMetaData with getDataAtTime?
>
> Thanks,
>
> Adam
>
> *From:* Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>
> *Sent:* Tuesday, March 15, 2022 4:21 PM
> *To:* Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
> *Subject:* Re: Epics Archiver Appliance and Network Storage slowdown
>
>>> For the 18000PVs that was also with getDataAtTime endpoint I assume?
>
> Yes; this was with the getDataAtTime API call.
>
>>> I would agree I am not completely sold on the network not being the issue
>
> It may not be the network; may be the file system...
>
>>> would I need to rebuild the AA or is it a config at run time read in?
>
> It is just a config change; just an extra JVM parameter when you start up just the retrieval component.
>
>>> Can you request multiple PVs at once with the getData endpoint?
>
> You can but it's probably not you are looking for if you want data at some point in time.
>
>>> they are quite a few large waveform arrays
>
> I would try breaking these out into separate request(s) at least for testing purposes.
>
> Regards,
>
> Murali
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> *From:*Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>
> *Sent:* Tuesday, March 15, 2022 3:54 PM
> *To:* Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov> <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* RE: Epics Archiver Appliance and Network Storage slowdown
>
> Thanks for all your help Murali!
>
> I would agree I am not completely sold on the network not being the issue but the file transfer itself was not as slow as I thought it would have been benchmarking with dd.
>
> For the 18000PVs that was also with getDataAtTime endpoint I assume? I am a little curious to maybe just hook up my NFS to a switch and AA PC and test out pulling directly instead of going through all the IT infrastructure of our isolated network. Maybe be able to narrow down if it’s the network storage file access or the network itself.
>
> I would like to try this but could you explain where I would add/change this, would I need to rebuild the AA or is it a config at run time read in?
>
> That is correct of the 3000 PVs they are quite a few large waveform arrays causing the 40MB file size.
>
> Other Question: Can you request multiple PVs at once with the getData endpoint? I wasn’t sure if I could pass a list similar to the getDataAtTime endpoint or if I would need to do pv=name1 &pv=name2&pv=name3 if it was possible? I think we had went with the getDataAtTime due to the list of PVs option and passing a specific time (which I think I can just pass the time as from/to as the same with getData?) but if the better way is to execute the call I will gladly swap.
>
> Adam
>
> *From:* Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>
> *Sent:* Tuesday, March 15, 2022 9:48 AM
> *To:* Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
> *Subject:* Re: Epics Archiver Appliance and Network Storage slowdown
>
> I did some testing locally as well.
>
>>> I do not think it has to do much with the network
>
> I would incline to disagree (but I may be wrong); the getDataAtTime API is a secondary API ( the primary one being the getData API call) and is largely targeted at quality control for save/restore applications. The PB data formats are not necessarily optimized for this call and therefore we use a lot of parallelism to extract whatever performance we can. So, the latency of the file system calls ( and not the throughput ) is a significant factor in the performance of this API call.
>
> For example, in my test system (using NFS as my MTS), I am able to get 18000 PV's in 42 seconds. In one of my production systems (with some real data variance), I am able to get 9000 PV's in 9 seconds. In all of these systems, there is probably an upper limit after which we run out of some resource further downstream.
>
> The most obvious one is the size of the ForkJoin common thread pool and I tested this in a different production system ( with a slightly older GPFS). I was able to improve performance in the same system by increasing the ForkJoin common thread pool size. So this is a 12 core/24 thread CPU. With the default ForkJoin common thread pool size, I get 9000 PV's in a minute. By increasing this to 48
>
> ( -Djava.util.concurrent.ForkJoinPool.common.parallelism=48 )
>
> and then to 64
>
> ( -Djava.util.concurrent.ForkJoinPool.common.parallelism=64 )
>
> I was able to reduce this to 30 seconds or less. So, based on your setup, you could try a similar thing and see if it makes a difference. Alternatively, you can also try reducing this to see if the performance is the result of some starvation someplace.
>
>>> file size for the 3000 PVs is about 40MB only
>
> I still do not understand this; 18000 scalar PV's for me are 2MB. How does 3000 PV's become 40MB in your situation ( unless you have large some waveforms in the mix)?
>
> Hope that helps.
>
> Regards,
>
> Murali
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> *From:*Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>
> *Sent:* Monday, March 14, 2022 5:46 PM
> *To:* Manoussakis, Adamandios <manoussakis1 at llnl.gov <mailto:manoussakis1 at llnl.gov>>; Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov><tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* RE: Epics Archiver Appliance and Network Storage slowdown
>
> Hi Murali,
>
> I had some data back from testing, the http post request is asking for about 3000 PVs from the AA which stores about 4500 PVs Total in the LTS.
>
> I did the benchmark test to just transfer a single pb file and was getting around 100MB/s, so I do not think it has to do much with the network since the file size for the 3000 PVs is about 40MB only. I then reduced the request down to about 1200 PVs and it definitely sped up quite a bit around 15 seconds from the 2.5mins it was taking. Someone mentioned the lookup time for each PV was log(n) or nlog(n) for the AA but then that doesn’t seem like it would impact going from 1200 to 3000 in the POST.
>
> I am curious how many PVs you usually request in a single http when requesting from the AA, is there an upper limit I should stick to? You mentioned possibly breaking up the requests into smaller chunks.
>
> Thanks
>
> Adam
>
> *From:* Tech-talk <tech-talk-bounces at aps.anl.gov <mailto:tech-talk-bounces at aps.anl.gov>> *On Behalf Of *Manoussakis, Adamandios via Tech-talk
> *Sent:* Sunday, February 20, 2022 10:44 PM
> *To:* Shankar, Murali <mshankar at slac.stanford.edu <mailto:mshankar at slac.stanford.edu>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
> *Subject:* RE: Epics Archiver Appliance and Network Storage slowdown
>
> Thanks Murali,
>
> I will try to run some more tests including the one that was mentioned to make sure the transfer rates look correct. I cant imagine the local vs NAS on this small of a setup shouldnt be vastly different.
>
> *From:* Tech-talk <tech-talk-bounces at aps.anl.gov <mailto:tech-talk-bounces at aps.anl.gov>> *On Behalf Of *Shankar, Murali via Tech-talk
> *Sent:* Thursday, February 17, 2022 12:01 PM
> *To:* tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
> *Subject:* Re: Epics Archiver Appliance and Network Storage slowdown
>
>>> I think for this experiment its only 6000 PVs
>
> I think that should not take this long. Will look into this a bit here as well.
>
> Regards,
>
> Murali
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> *From:*Shankar, Murali
> *Sent:* Saturday, February 12, 2022 8:03 AM
> *To:* tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov><tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* Re: Epics Archiver Appliance and Network Storage slowdown
>
>>> We are using the getDataAtTime endpoint
>
>>> 40MB file from the archive
>
> By this I'd guess that you are getting data for several 100,000's of PVs? The getDataAtTime API call has to look at all these 100,000 files ( with perhaps cache-misses for most of them ) and then do binary search to determine the data point. Your NAS needs to support quite a high rate of IOPS to for this to come back quickly. And this is a usecase where even the smallest latency tends to accumulate quickly. Perhaps you can consider breaking down your request into smaller chunks when using the NAS?
>
> Regards,
>
> Murali
>
- Replies:
- Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- References:
- Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- RE: Epics Archiver Appliance and Network Storage slowdown Manoussakis, Adamandios via Tech-talk
- RE: Epics Archiver Appliance and Network Storage slowdown Manoussakis, Adamandios via Tech-talk
- Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- RE: Epics Archiver Appliance and Network Storage slowdown Manoussakis, Adamandios via Tech-talk
- Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- RE: Epics Archiver Appliance and Network Storage slowdown Manoussakis, Adamandios via Tech-talk
- Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- RE: Epics Archiver Appliance and Network Storage slowdown Manoussakis, Adamandios via Tech-talk
- Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- RE: Epics Archiver Appliance and Network Storage slowdown Manoussakis, Adamandios via Tech-talk
- Re: Epics Archiver Appliance and Network Storage slowdown Michael Davidsaver via Tech-talk
- Navigate by Date:
- Prev:
Re: Timestamp test questions on Linux Michael Davidsaver via Tech-talk
- Next:
Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
<2022>
2023
2024
- Navigate by Thread:
- Prev:
Re: Epics Archiver Appliance and Network Storage slowdown Michael Davidsaver via Tech-talk
- Next:
Re: Epics Archiver Appliance and Network Storage slowdown Shankar, Murali via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
<2022>
2023
2024
|