discuss: HOWTO on Open Source Databases

Previous by date:	5 Apr 2005 17:59:10 -0000 Re: Trojan files on TLDP server?, Yves Bellefeuille
Next by date:	5 Apr 2005 17:59:10 -0000 http://www.ibiblio.org/pub/Linux/docs/HOWTO/other-formats/ only has .tar.gz files, Ian Hakes
Previous in thread:	5 Apr 2005 17:59:10 -0000 Re: HOWTO on Open Source Databases, Saqib Ali
Next in thread:

Subject: Re: HOWTO on Open Source Databases
From: Edward Cherlin ####@####.####
Date: 5 Apr 2005 17:59:10 -0000
Message-Id: <200504051106.56827.edward.cherlin@etssg.com>

On Friday 01 April 2005 06:55, Saqib Ali wrote:
> Hello Vikas,
>
> If you have already started the HOWTO, then it should be OK.
> However, I think there is lot one can write about differences
> between various DBs, e.g.
>
> 1) Support for foreign keys, refrential integrity etc
> 2) Storage format
> 3) Support for Transaction, ACID complaince etc
> 4) Difference between the licensing they use. Commercial vs
> non-commercial use 5) Available tools for managing DB
> 6) Schema builders etc

Also, the degree of support for Unicode. 

What we want to begin with is correct storage of appropriately 
normalized data in one or more of the up-to-date standard 
formats, either chosen globally or tagged. UTF-8 and UTF-16 are 
generally preferred. Handling of surrogates in UTF-16 is 
essential for Hong Kong Chinese and for various historical data. 
UTF-8 data must be in shortest form. (This is a security issue, 
since non-shortest-form UTF-8 can contain control codes capable 
of taking over some terminals.) The use of normalized data is 
essential for supporting proper searching, that is, for getting 
correct results from queries. Options for unnormalized data and 
other formats would be acceptable and sometimes desirable, so 
that raw inputs can be stored and analyzed, but they should 
perhaps be treated simply as binary data.

Next, we want correct Unicode searching, including extended 
regular expressions with Unicode character class identifiers, 
within the query language. kregexp is the standard KDE widget 
for composing such expressions, and every major programming 
language is adding its own version of regexps.

Implementation of the Unicode Collation Algorithm with options 
for more linguistically correct sorting is also important. 
Sorting on Unicode code point values alone gives quite bad 
results, since a number of writing systems were allocated more 
than one code block.

The database engine does no display, and needs no rendering, 
including Bidi support, but some of the tools used with the 
engine, particularly any GUI frontends, need to support correct 
Unicode display in all languages, and in any mixture of 
languages. Pango is the standard rendering engine on Linux, with 
Graphite and Scribe in the offing.

> Let me know if you are interested on working something like. I
> was planning to work on this, but if you are interested let me
> know.
>
> Thanks.

Keep me posted also. I can provide much more detail on Unicode 
requirements and possibilities, and I would like to be able to 
refer to your work in the Unicode HOWTO in future.
-- 
Edward Cherlin, Simputer Evangelist
Encore Technologies (S) Pte. Ltd.
The Village Information Society
http://cherlin.blogspot.com

Previous by date:	5 Apr 2005 17:59:10 -0000 Re: Trojan files on TLDP server?, Yves Bellefeuille
Next by date:	5 Apr 2005 17:59:10 -0000 http://www.ibiblio.org/pub/Linux/docs/HOWTO/other-formats/ only has .tar.gz files, Ian Hakes
Previous in thread:	5 Apr 2005 17:59:10 -0000 Re: HOWTO on Open Source Databases, Saqib Ali
Next in thread: