summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Marc Cromme [Fri, 15 Aug 2003 13:17:01 +0000 (13:17 +0000)]
first working debian package now
Marc Cromme [Fri, 15 Aug 2003 12:49:03 +0000 (12:49 +0000)]
inital debian package made
Marc Cromme [Thu, 14 Aug 2003 10:18:24 +0000 (10:18 +0000)]
no specific tkl stuff present any more
Marc Cromme [Thu, 14 Aug 2003 08:19:15 +0000 (08:19 +0000)]
diverse rules added
Marc Cromme [Thu, 14 Aug 2003 08:17:05 +0000 (08:17 +0000)]
init script for package tkl-web-harvester added
Marc Cromme [Thu, 14 Aug 2003 08:02:10 +0000 (08:02 +0000)]
tcl web harvesting script for tkl project added
Adam Dickmeiss [Wed, 11 Jun 2003 10:29:41 +0000 (10:29 +0000)]
xmlwf output
Adam Dickmeiss [Wed, 11 Jun 2003 10:11:39 +0000 (10:11 +0000)]
XML headers with character encoding as specified by HTTP server
Adam Dickmeiss [Wed, 11 Jun 2003 09:40:22 +0000 (09:40 +0000)]
Use suffix _.tkl
Adam Dickmeiss [Wed, 11 Jun 2003 08:49:09 +0000 (08:49 +0000)]
robotSeq(t) moved to control(task,seq)
Adam Dickmeiss [Tue, 10 Jun 2003 13:16:16 +0000 (13:16 +0000)]
Fix default rule match
Adam Dickmeiss [Tue, 10 Jun 2003 12:46:04 +0000 (12:46 +0000)]
Fix unlink
Adam Dickmeiss [Tue, 10 Jun 2003 12:29:48 +0000 (12:29 +0000)]
Deny is default
Adam Dickmeiss [Tue, 10 Jun 2003 12:12:35 +0000 (12:12 +0000)]
Ignore non-task files
Adam Dickmeiss [Tue, 10 Jun 2003 12:08:17 +0000 (12:08 +0000)]
Fixes for tasks w full paths
Adam Dickmeiss [Tue, 10 Jun 2003 11:55:57 +0000 (11:55 +0000)]
README updates
Adam Dickmeiss [Tue, 10 Jun 2003 11:55:18 +0000 (11:55 +0000)]
Usage
Adam Dickmeiss [Tue, 10 Jun 2003 11:43:52 +0000 (11:43 +0000)]
Tasks. TKL integration
Adam Dickmeiss [Mon, 13 Jan 2003 13:59:07 +0000 (13:59 +0000)]
Fix check for content-type
Adam Dickmeiss [Fri, 20 Sep 2002 09:45:02 +0000 (09:45 +0000)]
Look for Tcl on Debian systems
Adam Dickmeiss [Tue, 18 Jun 2002 19:57:53 +0000 (19:57 +0000)]
unset meta attributes (so they are reset for next meta)
Adam Dickmeiss [Mon, 25 Mar 2002 16:13:21 +0000 (16:13 +0000)]
Remove code that skips ?'s in URL
Adam Dickmeiss [Mon, 25 Mar 2002 16:11:08 +0000 (16:11 +0000)]
*** empty log message ***
Adam Dickmeiss [Thu, 28 Feb 2002 14:04:11 +0000 (14:04 +0000)]
Fix unvisited status
Adam Dickmeiss [Sun, 17 Feb 2002 09:29:18 +0000 (09:29 +0000)]
Robot honour robots meta tag
Adam Dickmeiss [Wed, 14 Nov 2001 09:15:23 +0000 (09:15 +0000)]
File status written with counts of areas: unvisited, bad, visited.
Tag area src=.. used for relative links.
Adam Dickmeiss [Tue, 13 Nov 2001 11:17:26 +0000 (11:17 +0000)]
MIME check when reading HTTP header (not when reading content).
File robots.txt always read - even when text/plain is denied.
Adam Dickmeiss [Fri, 9 Nov 2001 13:26:50 +0000 (13:26 +0000)]
Robot follows <frame src=...>.
Adam Dickmeiss [Thu, 8 Nov 2001 14:22:21 +0000 (14:22 +0000)]
Added tests script.
Adam Dickmeiss [Thu, 8 Nov 2001 13:49:06 +0000 (13:49 +0000)]
Fixed bug regarding relative URLs.
Adam Dickmeiss [Thu, 8 Nov 2001 10:23:02 +0000 (10:23 +0000)]
Fixed bug in skipSpace (didn't check for null-byte).
Adam Dickmeiss [Wed, 7 Nov 2001 11:50:07 +0000 (11:50 +0000)]
Use simpler regular expression to avoid Tcl regsub error (Tcl8.0.4-5).
Adam Dickmeiss [Wed, 7 Nov 2001 11:30:52 +0000 (11:30 +0000)]
Glob-expressions may be expressed as a list in rules (multi-OR).
Adam Dickmeiss [Wed, 31 Oct 2001 08:51:49 +0000 (08:51 +0000)]
Robot saves metadata with unique names in directory "flat" (if it exists).
Adam Dickmeiss [Tue, 30 Oct 2001 08:29:54 +0000 (08:29 +0000)]
Pattern may be negated in rules (! as first character does that)
Adam Dickmeiss [Fri, 26 Oct 2001 13:26:11 +0000 (13:26 +0000)]
Implemented Allow/deny rules. Better Tcl autoconfig.
Adam Dickmeiss [Fri, 29 Jun 2001 22:25:55 +0000 (22:25 +0000)]
Yet another fix regarding relative links.
Adam Dickmeiss [Fri, 29 Jun 2001 21:47:31 +0000 (21:47 +0000)]
Added option to specify Accept-Language.
Adam Dickmeiss [Thu, 7 Jun 2001 08:17:00 +0000 (08:17 +0000)]
Fixes for robots.txt handling (bug introduced by previous commit).
Adam Dickmeiss [Thu, 7 Jun 2001 08:10:10 +0000 (08:10 +0000)]
Bug fix for relative links.
Adam Dickmeiss [Wed, 6 Jun 2001 09:37:18 +0000 (09:37 +0000)]
Added some character entities for mapping.
Adam Dickmeiss [Wed, 6 Jun 2001 07:10:31 +0000 (07:10 +0000)]
Added README. Ignore case in keywords in robots.txt.
Adam Dickmeiss [Tue, 5 Jun 2001 08:44:50 +0000 (08:44 +0000)]
maxDistance set to 50 default.
Adam Dickmeiss [Tue, 5 Jun 2001 07:46:00 +0000 (07:46 +0000)]
Remove characters after semicolon in header contents.
Adam Dickmeiss [Tue, 27 Feb 2001 10:45:44 +0000 (10:45 +0000)]
Minor changes.
Adam Dickmeiss [Mon, 26 Feb 2001 22:51:51 +0000 (22:51 +0000)]
Added config for zebra/zmbol.
Adam Dickmeiss [Tue, 23 Jan 2001 14:28:41 +0000 (14:28 +0000)]
Minor fix for anchor references.
Adam Dickmeiss [Tue, 23 Jan 2001 12:05:06 +0000 (12:05 +0000)]
Removed YAZ dependency.
Adam Dickmeiss [Tue, 23 Jan 2001 11:26:43 +0000 (11:26 +0000)]
Added options for the robot.
Adam Dickmeiss [Tue, 23 Jan 2001 09:20:32 +0000 (09:20 +0000)]
Multiple http connections. Bug fixes.
Adam Dickmeiss [Mon, 11 Dec 2000 17:11:03 +0000 (17:11 +0000)]
Fixed problem with links having .. for root directory of web server.
Thank you FrontPage.
Adam Dickmeiss [Sun, 10 Dec 2000 22:27:48 +0000 (22:27 +0000)]
Implemented robots.txt rules.
Adam Dickmeiss [Fri, 8 Dec 2000 22:46:53 +0000 (22:46 +0000)]
File robots.txt now read the each domain.
Pages are now fetched in a Round-robin fashion.
Adam Dickmeiss [Fri, 8 Dec 2000 08:55:35 +0000 (08:55 +0000)]
DCdot doesn't rely on htmlSwitch no more.
Adam Dickmeiss [Thu, 7 Dec 2000 20:16:11 +0000 (20:16 +0000)]
Added -nonest for htmlSwitch statement. Robot puts reference to
bad URLs in bad area.
Adam Dickmeiss [Mon, 27 Dec 1999 11:49:30 +0000 (11:49 +0000)]
Major speed improvement.
Adam Dickmeiss [Thu, 4 Feb 1999 21:32:00 +0000 (21:32 +0000)]
Updated configure script.
Per M. Hansen [Thu, 4 Feb 1999 20:37:25 +0000 (20:37 +0000)]
Changed tags for the output.
Adam Dickmeiss [Thu, 15 Oct 1998 13:27:19 +0000 (13:27 +0000)]
Minor changes.
Adam Dickmeiss [Thu, 15 Oct 1998 12:31:25 +0000 (12:31 +0000)]
Added configure script.
Adam Dickmeiss [Thu, 15 Oct 1998 12:30:59 +0000 (12:30 +0000)]
Buf fixes. Robot saves body of text without tags and java script sections.
Adam Dickmeiss [Tue, 6 Aug 1996 14:04:22 +0000 (14:04 +0000)]
Initial revision