discuss: HOWTO on Open Source Databases
Subject:
Re: HOWTO on Open Source Databases
From:
Edward Cherlin ####@####.####
Date:
5 Apr 2005 17:59:10 -0000
Message-Id: <200504051106.56827.edward.cherlin@etssg.com>
On Friday 01 April 2005 06:55, Saqib Ali wrote:
> Hello Vikas,
>
> If you have already started the HOWTO, then it should be OK.
> However, I think there is lot one can write about differences
> between various DBs, e.g.
>
> 1) Support for foreign keys, refrential integrity etc
> 2) Storage format
> 3) Support for Transaction, ACID complaince etc
> 4) Difference between the licensing they use. Commercial vs
> non-commercial use 5) Available tools for managing DB
> 6) Schema builders etc
Also, the degree of support for Unicode.
What we want to begin with is correct storage of appropriately
normalized data in one or more of the up-to-date standard
formats, either chosen globally or tagged. UTF-8 and UTF-16 are
generally preferred. Handling of surrogates in UTF-16 is
essential for Hong Kong Chinese and for various historical data.
UTF-8 data must be in shortest form. (This is a security issue,
since non-shortest-form UTF-8 can contain control codes capable
of taking over some terminals.) The use of normalized data is
essential for supporting proper searching, that is, for getting
correct results from queries. Options for unnormalized data and
other formats would be acceptable and sometimes desirable, so
that raw inputs can be stored and analyzed, but they should
perhaps be treated simply as binary data.
Next, we want correct Unicode searching, including extended
regular expressions with Unicode character class identifiers,
within the query language. kregexp is the standard KDE widget
for composing such expressions, and every major programming
language is adding its own version of regexps.
Implementation of the Unicode Collation Algorithm with options
for more linguistically correct sorting is also important.
Sorting on Unicode code point values alone gives quite bad
results, since a number of writing systems were allocated more
than one code block.
The database engine does no display, and needs no rendering,
including Bidi support, but some of the tools used with the
engine, particularly any GUI frontends, need to support correct
Unicode display in all languages, and in any mixture of
languages. Pango is the standard rendering engine on Linux, with
Graphite and Scribe in the offing.
> Let me know if you are interested on working something like. I
> was planning to work on this, but if you are interested let me
> know.
>
> Thanks.
Keep me posted also. I can provide much more detail on Unicode
requirements and possibilities, and I would like to be able to
refer to your work in the Unicode HOWTO in future.
--
Edward Cherlin, Simputer Evangelist
Encore Technologies (S) Pte. Ltd.
The Village Information Society
http://cherlin.blogspot.com