Hi Rod,
First of all, your regexp ^MeasurePM*([0-9.]+) requires that the HTML input
starts with MeasurePM, which is not the case. The ^ at the beginning is the
problem. It does not refer to the beginning of a line but of the parsed input.
That is all input minus the text that has already been parsed before. In your
case it is the complete reply from the web server including the HTTP header
"HTTP/1.0 ..." which a browser does not show.
Parsing multiple values from one input is possible with the syntax
%(otherrecord.VAL)... but only if the values always come in the same order.
What you can do is (assuming constant order):
* find "Measure(PPM):"
* read value
* find "Sample flow(CC):"
* read value
* find "Scale:"
* read value
* find "System status:"
* read value
Each find can be done with a regexp which is then ignored with %*:
read {
extraInput=ignore;
interminator = "</HTML>"; # terminators can have arbitrary length
connect 1000; # connect to server, 1 second timeout
out "GET http://142.90.132.71/lcd.cgi"; # HTTP request
in "%*/Measure\(PPM\):/%f"
"%*/Sample flow\(CC\):/%(\$1:FLOW.VAL)f"
"%*/Scale:/%(\$1:SCALE)s"
"%*/System status:/%(\$1:STATUS)s";
}
The "master" record is PPM. This is the only one with StreamDevice support. The
other records (FLOW, SCALE, STATUS) are passive soft records. Their device name
is passed to the protocol by argument #1 and replace \$1 in the redirected formats.
record (ai, "ISAC2:N2ANAL1:PPM")
{
field (DTYP, "stream")
field (FLNK, "ISAC2:N2ANAL1:FLOW")
field (INP, "@regexp.proto read(ISAC2:N2ANAL1) n2anal1")
}
record (ai, "ISAC2:N2ANAL1:FLOW")
{
}
record (stringin, "ISAC2:N2ANAL1:SCALE")
{
}
record (stringin, "ISAC2:N2ANAL1:STATUS")
{
}
Maybe you want to parse STATUS with %{OK|...} and put it into an mbbi. The same
for SCALE if there is a small set of scales only.
I hope that helps,
Dirk
Rod Nussbaumer wrote:
Greetings, EPICS users.
I am trying to use streamDevice with regular expression support to parse
input from an instrument containing an embedded web server. So far, I've
successfully built and run the streamApp sample IOC that gets built with
the streamDevice distribution. I am able to parse a single HTML element
(TITLE, with minor mods to the example protocol file) from a static web
page served by an Apache web server. So, I think I've built everything
properly (streamDevice 2.4, asyn 4-9, EPICS 3.14.10, Linux=Scientific
Linux 4.x).
My difficulty is that the instrument produces a page containing three
strings that I wish to extract into three EPICS records. I'm testing
with string-in records for now, but eventually I would prefer to scan to
analog-in records. I am able to extract the first element from the HTML,
although I do get error messages from streamDevice:
"Input "HTTP/1.0 200 OK<0d><0a>Con..." does not match format
%.1/^MeasurePM*([0-9.]+)/"
However, I am unsure about how to get multiple fields scanned from the
HTML using a single HTTP GET. I would really like to avoid doing one
whole HTTP fetch per record, since the instrument is fairly primitive,
and can probably become overwhelmed by excessive traffic, and it may be
the case that readings on a single fetch are synchronized in time.
From my interpretation of the documentation, the scanner should stop,
leaving unread text in the pipeline once it has made a successful match.
However, I don't understand how to continue scanning the remainder of
the HTML page to extract the remaining data.
I attach below, the streamDevice protocol file I'm presently using (the
latest flavor that gets me closest to working), the relevant database,
and the entire HTML that the instrument returns. Much of this is based
on the examples from streamDevice.
#=========================<protocol file>=============================
# regular expression example
# extract the title of from a web page
outterminator = NL;
interminator = "</HTML>"; # terminators can have arbitrary length
# Web servers close the connection after sending a page.
# Thus, we can't use autoconnect (see drvAsynIPPortConfigure)
# Handle connection manually in protocol.
readTitle {
extraInput=ignore;
interminator = "</HTML>"; # terminators can have arbitrary length
connect 1000; # connect to server, 1 second timeout
out "GET http://isacwserv.triumf.ca"; # HTTP request
in "%.1/<TITLE>(.+)<\/TITLE>/"; # get string in <TITLE></TITLE>
disconnect;
}
readPPM {
extraInput=ignore;
interminator = "</HTML>"; # terminators can have arbitrary length
connect 1000; # connect to server, 1 second timeout
out "GET http://142.90.132.71/lcd.cgi"; # HTTP request
in "%.1/^Measure\(PPM\):\s*([0-9.]+)/"; # get string in Measure
field
}
readFLOW {
extraInput=ignore;
in "%.1/^Sample flow\(CC\):\\s*([0-9.]+)/"; # get string in Flow field
}
readSCALE {
extraInput=ignore;
in "%.1/Scale:\s*(.+)\n/"; # get string in Scale field
disconnect; # servers closes, so do we.
}
#==================< EPICS db file >===========================
record (stringin, "ISAC:WSERV:TITLE")
{
field (DTYP, "stream")
field (INP, "@regexp.proto readTITLE isacwserv")
}
record (stringin, "ISAC2:N2ANAL1:PPM")
{
field (DTYP, "stream")
field (FLNK, "ISAC2:N2ANAL1:FLOW")
field (INP, "@regexp.proto readPPM n2anal1")
}
record (stringin, "ISAC2:N2ANAL1:FLOW")
{
field (DTYP, "stream")
field (FLNK, "ISAC2:N2ANAL1:SCALE")
field (INP, "@regexp.proto readFLOW n2anal1")
}
record (stringin, "ISAC2:N2ANAL1:SCALE")
{
field (DTYP, "stream")
field (INP, "@regexp.proto readSCALE n2anal1")
}
#==================< Instrument web page >=======================
# (additional line breaks inserted by e-mail; hopefully this gets
# displayed as HTML text, and not as a web page)
#
<HTML><HEAD><meta http-equiv="refresh" content="1"></HEAD><BODY
bgcolor="#BEF06E"><!-- Affiche le contenu de l'écran -->
<FONT color="black" size="6"><B><PRE>
<<RUN MODE>> F4:RET
Measure(PPM): 2.3
Sample flow(CC): 50.0 Scale:0-10 PPM
System status:OK </PRE></FONT></B></BODY></HTML>
===============================================================
I am trying to extract the values for 'Measure (PPM)', 'flow(CC)', &
'Scale:'. When the 'FLOW' and 'SCALE' records process, streamDevice
reports:
2009/03/04 10:49:59.678 n2anal1 ISAC2:N2ANAL1:PPM: Input "HTTP/1.0 200
OK<0d><0a>Con..." does not match format %.1/^MeasurePM*([0-9.]+)/
2009/03/04 10:49:59.680 n2anal1 ISAC2:N2ANAL1:FLOW: asynError in read:
142.90.132.71:80 connection closed
2009/03/04 10:49:59.680 n2anal1 ISAC2:N2ANAL1:FLOW: I/O error after
reading 0 bytes: ""
2009/03/04 10:49:59.680 n2anal1 ISAC2:N2ANAL1:FLOW: Protocol aborted
2009/03/04 10:50:00.677 timerQueue ISAC2:N2ANAL1:SCALE: I/O error after
reading 0 bytes: ""
2009/03/04 10:50:00.677 timerQueue ISAC2:N2ANAL1:SCALE: Protocol aborted
If I add some more backslashes to the regex in front of the parentheses
that are embedded in the HTML text, the error messages more closely
match the actual HTML text. I can't seem to find a 'right' combination
of back-slashes that makes streamDevice and PCRE happy.
Thanks for any insight.
Rod Nussbaumer
ISAC Controls, TRIUMF
Vancouver, Canada.
--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182
- Replies:
- Re: Help with streamDevice parsing HTML Dirk Zimoch
- References:
- Help with streamDevice parsing HTML Rod Nussbaumer
- Navigate by Date:
- Prev:
Re-wrap CaChannel based on PythonCA Wang Xiaoqiang
- Next:
Re: Help with streamDevice parsing HTML Dirk Zimoch
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Help with streamDevice parsing HTML Rod Nussbaumer
- Next:
Re: Help with streamDevice parsing HTML Dirk Zimoch
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|