2002 2003 2004 <2005> 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | Index | 2002 2003 2004 <2005> 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Re: CA V4 Protocol Specification |
From: | Marty Kraimer <[email protected]> |
To: | [email protected] |
Date: | Wed, 26 Oct 2005 06:55:31 -0500 |
Benjamin Franksen wrote:
On Wednesday 26 October 2005 00:33, Jeff Hill wrote:Attached is a rough cut at the EPICS V4 CA protocol specification.Hi Jeff, Just one little thought about STRING data type: UINTN the number of UTF-8 tokens OCTET sequence UTF-8 encoded character string sequenceI take it that 'number of UTF-8 tokens' means 'number of octets', right? Maybe it would be worthwhile to consider adding a 'number of /characters/' count in addition to the byte count. This could improve performance, particularly when converting to other encodings on the client side. Of course any gain must be offset against the increased protocol overhead.The same applies of course to any library string representations. Ben
Java 5 uses 16 bits for char, which is not sufficient to encode all uni-code character sets. It uses 2 consecutive chars to hold a unicode character that does not fit in 16 bits.
At least some C/C++ implementations use 32 bits for wchar which is sufficient for all unicode characters.
But what if an implementation uses 16 bits? Thus how will the number of characters in a UTF-8 string be used? Better to just let final sender/receiver of the character string handle it. Marty