discuss: Review of GNU/Linux Tools Summary


Previous by date: 15 Dec 2003 07:28:28 -0000 Re: Review of GNU/Linux Tools Summary, doug jensen
Next by date: 15 Dec 2003 07:28:28 -0000 DocBook created html may break links but LinuxDoc OK, David Lawyer
Previous in thread: 15 Dec 2003 07:28:28 -0000 Re: Review of GNU/Linux Tools Summary, doug jensen
Next in thread: 15 Dec 2003 07:28:28 -0000 Re: Review of GNU/Linux Tools Summary, Martin WHEELER

Subject: Re: Review of GNU/Linux Tools Summary
From: David Lawyer ####@####.####
Date: 15 Dec 2003 07:28:28 -0000
Message-Id: <20031215072745.GA897@lafn.org>

On Mon, Dec 15, 2003 at 03:52:41AM +0000, Chris Karakas wrote:
> David Lawyer ####@####.#### schrieb:
> >
> >On Sun, Dec 14, 2003 at 10:45:08AM +0000, Chris Karakas wrote:
> 
> >> Why? Because Chapters and sections will become separate HTML
> >> documents.
> >
> >Not always so.  The text format is a single doc.  Also, LDP provides
> >html docs in both in multiple pages (separate HTML docs) and single
> >pages.  Search the Internet for HOWTOs and you'll find single page ones
> >also (in text, html, pdf).
> >
> 
> You are correct only in part, the part that refers to "single chunk
> documents". 

I'm fully correct since text and pdf are also in one part.

> But my concern were not those documents.   Why? Because I have yet to
> see that I type
> 
> "network comands"
> 
> in Google and land into the one, huge HTML, or txt, or even PDF
> document. I almost always will land into the "chunked" version. The
> above choice of keywords and search engine are just an example. We can
> take whatever keywords and whatever search engine you like - if there
> is a chunked version there, you will get the chunked version.
> 
> There are various reasons for this, one of them being that search
> engines don't read a document that is too long till the end.

I just googled for my Modem-HOWTO.  I got hundreds of hits and some of
the long ones were near the beginning (like say #8).  In fact, most of
the hits were on the single chunk ones (including txt, pdf, ps).  The
problem is that the majority of sites have stale documentation.  The
first 3 hits were over 2 years old, while the latest version is one
month old.  (Actually the 2nd hit didn't even contain the howto so I'm
counting a null howto as infinitely old.)

> 
> So forget about the huge, one-chunk docs, as a search engine strategy.
> If you want to be found by the SEs, you must rely on the chunked
> versions - and perhaps a little on PDF, but only a little.

Unless there are no chunked versions.  

> But my point lies even further: we are not talking about a user who is
> searching for a unique, multiple keyword phrase that identifies the
> content of your reorganized document. We are talking about a user who
> just searches for, say, two keywords, for the sake of example:
> "network commands".
> 
> If you change the label, you change the filename of the chunked
> version. 

Not so in my chunked html and in the chunked html at LDP for docs
written with LinuxDoc.  The labels are numeric and whatever name or
label you selected for that section is ignored.  Here are the chunks of
my article on fuel efficiency in the 20th century.  These chunk names
stay the same regardless of what names I give to the chunk (section
name) or what label I use in LinuxDoc-sgml like <label id="appendix_">

fuel-eff-20th-1.html
fuel-eff-20th-2.html
fuel-eff-20th-3.html

> If you do this, the search engine will NOT think "Ahh...the file
> network-commands.html is not there, let's present the huge document
> that contains the whole HOWTO - at the same ranking place!"
> 
> First, the SE does not know that network-commands.html is just a chunk
> of some "whole" document, book1.html. 

But for me network-commands.html would be say book1-7.html and the SE
will still find this.  

> There is nothing that a SE does to find this out - not with today's
> technology. The two documents are different for the SE.
> 
> Second, the big one, book1.html, contains much more text, therefore
> the importance of the "network commands" part of it is "diluted" from
> the surrounding, irrelevant text (irrelevant to what the user is
> searching, "network commands"). Therefore, the document will rank at a
> place that is way back - not visible, dead.

Well, I've got mostly single chunk docs on my website (with no multiple
chunk versions) and they all get found.

> 
> Third, you may put it on TLDP, that alone does not guarantee good
> ranking. What is also important, is that people *link* to it. But if
> you change an existing label, thus changing the filename of the
> chunked version (which is the only important from the SE point of view
> for the reasons stated above), then you kill all the links to the
> previous URL. You kill what you were able to gather up to that point
> in terms of SE visibility. You start anew. See my post to Martin
> Wheeler for a more detailed description of this.
 
But that's not how it works for me or at LDP for LinuxDoc documents.
You must be using DocBook to generate HTML.  Why don't you use the
LinuxDoc method and avoid this problem :-)

At this point I visited the LDP site and discovered that the last
sentence is no joke.  I then went back and did some editing of what I
originally said.  DocBook behaves just like you claim while LinuxDoc
behaves just as I claim.  So we are both right.  Another reason for
using LinuxDoc.

> >Google has cashed versions so if one can't find something due to a
> >change in the link, then they can always look at the (old) cashed
> one.
> 
> No. Google's cash will not remain there for ever. Most people don't
> even realize it is there. After a few months, the cached version will
> disappear too. What then? Do we start at rank 1 million out of 2,5
> million again?
> 
> >But the most common case may be where one reaches the wrong chapter,
> >etc.
> 
> That's why I say "think about it". I don't say "when you reorganize,
> put content regarding editor commands in the chapter with the label
> 'network commands'". I say: try to keep label and content in sync, of
> course. You may rephrase, delete and insert text in the chapter or
> section with the label "network commands" as you please - you may even
> change its title, somewhat. But of course, the content should pertain
> to network commands, otherwise we have the situation you describe.
> 
> I guess this is not difficult to achieve: either there already exists
> a chapter/section on editor commands, in which case we put the extra
> content there, or there isn't and we create one. But we don't let an
> existing chapter or section just disappear. We'd better double think
> on our labels at the start, choose them in a way that fits our
> purposes, but we don't throw them away in the middle of a documents
> life. We kill the document if we do.

Not if you switch to LinuxDoc :-).  But you can crate the same results
with DocBook by making the labels numeric but it's messy and a lot more
work if you modify the doc.

> 
> > But since each chapter has a link to the table of contents, then
> >they can still find what they are looking for.
> >
> 
> No they will not. They want to find what they are looking for, *here*
> and *now*.  Only 1% will search further. I have had people ask me in
> emails if I have a PDF, although the Formats section is there to see
> in the ToC. "Ahh...I must have been blind" was the reaction, when I
> pointed them to it with a link. They were right, but it does not serve
> me anything if they go and never come back again because "the PDF was
> not there".
> 
> Want more? I have had people tell me that a link I gave them was
> broken, just because there was a dot at the end. Read the debate:
> 
> http://www.nukeforums.com/forums/viewtopic.php?t=18185
> 
> >Even "nowhere" will be found if one uses the exact search terms and
> if >you have something unique to offer.
> 
> No. Definitely not.

Well, I think that most of the docs on my personal site are not linked
to from anywhere else (except of course from search engines).  Yet
Google finds them.  They even pop up first if I search for the unique
info found in my writings.  People do find my docs and send me email
about them even though no one links to them.

> You will never find a document you killed this way again. Not if it
> was #15 out of 2,5 million before you killed it and not with the same,
> simple keyword combination. 95% of all people who read a document
> coming from a search engine, they come from the first 3-5 result pages
> of that SE. That's my experiene and it is shared among other
> webmasters. Changing the label brings the document (that chunk) back
> to result page 100000, rank 1 million (roughly, plus or minus a few
> hundred thousand), 10 results per page. And people will search for
> "network commands", that brings up 2,5 million results, not for
> "Displays contents of /proc/net files. It works with the Linux Network
> Subsystem", which uniquely identifies that chunk. People don't know
> that such a string exists. And for all the other strings they can
> think of, the SEs will spit millions of results.

Much of the time when I search for something on Google, I only get a few
hits.  Sometimes I get no hits.  You're right that many people don't use
enough search terms so as to hopefully find just what they want.
> 
> >But if it's at LDP, then it will >get a high ranking.  Thus I don't
> think much about search engines when >writing or revising a doc.

When I search for info to put into my HOWTOs, I usually get unwanted
hits on my own HOWTOs ad nauseum.  Thus I think that they must have a
high ranking.

> 
> Well, I see it. ;-) Please do. Here's a start:
> 
> http://www.webmasterworld.com/forum3/2010.htm

I'm striving for quality rather than ratings.  Quality in the long run
might lead to high rating.  The existing rating system is bound to
change for better or worse and I hope it's for better.

> >  However, since some search engines don't
> >consider key words much I'm told, then I try to put synonyms into the
> >body of the doc to help people find it.
> >
> 
> You are confusing meta-keywords in the header and keywords in the text
> body. You mean the meta-keywords. I am talking about keywords in the
> body.

Right.  I meant meta-keywords since I seldom see keywords in the body
and didn't think of that possibility.

> 
> -- -- Regards
> 
> Chris Karakas http://www.karakas-online.de
> 
			David Lawyer

Previous by date: 15 Dec 2003 07:28:28 -0000 Re: Review of GNU/Linux Tools Summary, doug jensen
Next by date: 15 Dec 2003 07:28:28 -0000 DocBook created html may break links but LinuxDoc OK, David Lawyer
Previous in thread: 15 Dec 2003 07:28:28 -0000 Re: Review of GNU/Linux Tools Summary, doug jensen
Next in thread: 15 Dec 2003 07:28:28 -0000 Re: Review of GNU/Linux Tools Summary, Martin WHEELER


  ©The Linux Documentation Project, 2014. Listserver maintained by dr Serge Victor on ibiblio.org servers. See current spam statz.