This commit was manufactured by cvs2svn to create tag

'ripnews-release_0_5_2'.
This commit is contained in:
Ward Wouts 2005-02-09 09:14:35 +00:00
parent 264eb2d3dc
commit 6eb30ec4e6
22 changed files with 4429 additions and 0 deletions

View file

@ -0,0 +1,113 @@
# $Dwarf: CHANGELOG,v 1.37 2005/02/05 08:29:36 ward Exp $
# $Source$
from 0.5.1 to 0.5.2
- major improvement in memory usage
- speed ups
- don't try to fetch really old headers
from 0.5.0 to 0.5.1
- fix some yenc problems with threads
- fix some thread return problems
from 0.2.3 to 0.5.0
- changes to make it work with ruby 1.8.1
- internal structures changed in article.rb
- huge of memory usage improvements... - 90% less memory usage
- (more) gracefully handle bad yencodings
- enable file inclusion
- add MODE READER command
- use a thread for decoding for multi-part posts
- locking cleaned up, no more calling of ps(1)
from 0.2.2 to 0.2.3
- notify when cachedir doesn't exist
- expand ${HOME} in config to your homedir
- fix problem where the limiting of the number of headers
to get in one call wouldn't work
- don't change cache files in place
- keep old config if there are errors while reloading
- change cache format to a file per server model (use
cacheconverter to convert cache files)
- add license text
- lots of small fixes
from 0.2.1 to 0.2.2
- improve output layout
- show running time
- PID lockfile implementation
- catch another error
- fetch subjects sorted so you get a better chance at getting
full series
- now your TEMPDIR can be on another drive than your DATADIR
- reread config on SIGHUP
from 0.2.0 to 0.2.1
- fail gracefully at a lack of configuration
- fail gracefully if tempdir doesn't exist or isn't writable
- implement DELEXT configuration option
- implement OPT_MR configuration option
from 0.1.0 to 0.2.0
- fix extension enforcing
- code cleanups
- split of uudecoding and ydecoding
- add some regression tests
- various bug fixes
- remove articles from newsrc that aren't on the server any
longer
- major speed improvements
from 0.0.9 to 0.1.0
- allow comments after continuing lines, like this:
OPT_I=(?i)( \
agresion| \ # Paul
apex theory| \
at the drive in| \ # Paul
bad religion| \
- some speed ups
- many more exceptions are handled
- more consistent error messages
from 0.0.8 to 0.0.9
- maxfilelength check
- improved subject checking
- linebuffered stdout
- always use push when adding stuff to an array, this is way more
efficient than +=
- always use << when adding stuff to a string, this is way more
efficient than +=
from 0.0.7 to 0.0.8
- more and simpler exceptions
- better argument checking
- more helpful help
from 0.0.6 to 0.0.7
- use exceptions for a lot of problems
- code cleanups
from 0.0.5 to 0.0.6
- new option -C for combined filenames eg. "subject-[filename]"
- prevent reconnect loops
- be more paranoid with decoding yEnc-encoded articles
- more/better timeouts
from 0.0.4 to 0.0.5
- implement timeouts on article fetching
(no more "hangs", hopefully)
- remove servers from list on connection failure
- much more robust
from 0.0.3 to 0.0.4
- server reconnects now work
from 0.0.2 to 0.0.3
- filtering on file extensions
- multiple servers are now tried in order
from 0.01 to 0.02
- yEnc support by Stijn Hoop. Thanks.
- change cache file format
- sort cache file
- minor bugs

View file

@ -0,0 +1,8 @@
# $Dwarf: INSTALL,v 1.1 2002/05/05 20:05:11 ward Exp $
# $Source$
For now the easiest way to install this is just extract the tarball in
its own directory and run ./ripnews.rb from there. Before running you
should make your own .ripnewsrc configuration file which is described in
the README file. You may have to change the first line in ripnews.rb to
point to your ruby executable.

View file

@ -0,0 +1,183 @@
# $Dwarf: README,v 1.17 2005/02/05 11:58:56 ward Exp $
# $Source$
Ripnews is a bulk downloader for usenet. It's quite flexible in terms of
configuration. Some of it's features are:
- basic support for multiple servers per group
- cacheing of article headers to speed up reading of newsgroups
- newsrc file support (one newsrc file per server)
- flexible but simple configuration
Configuration:
==============
I'll just give a commented example config, it should be pretty clear,
after that I'll list the possible options.
<== cut here ==>
# Set the default NNTPSERVER to localhost
NNTPSERVER=localhost
# Set the cachedir, this is where the subject caches are stored
# without this ripnews will be much slower (but should still work)
CACHEDIR=/mnt/newspace/News/.ripnews_caches
# PID lockfile, prevents multiple ripnews processes from running at the
# same time [global keyword]
LOCKFILE=/local/newspace/News/.ripnewslock
# Set the datadir, this where a subdir for each group will be made to
# store the ripped articles
DATADIR=/mnt/newspace/News
# Set the tempdir, used to store the undecoded data. Without this ripnews
# uses a lot more memory
TEMPDIR=/mnt/newspace/News/ripnews_temp
# Set include pattern to a case insensitive "grateful.dead"
OPT_I=(?i)grateful.dead
# Set the base newsrc name. The server name will be appended.
NEWSRCNAME=/ward/src/ruby/ripnews/.newsrc
# Set the permission to create subdirs with
PERMISSION=0700
# Set the niceness of the ripnews process [global keyword]
NICE=20
# For alt.binaries.e-book and alt.binaries.e-books change from defaults...
alt.binaries.e-book| \
alt.binaries.e-books {
# Set another include pattern
OPT_I=(?i)(bible|dickens|shakespeare)
}
alt.binaries.e-book.flood {
# Add to default pattern, this will not be case insensitive
# anymore, because that's how ruby patterns work
OPT_I+=|george.orwell
}
# For both alt.binaries.e-book, alt.binaries.e-books and
# alt.binaries.e-book.flood change some value
alt.binaries.e-book| \
alt.binaries.e-books| \
alt.binaries.e-book.flood {
# Sets long filenames. If this is set the subject will be used
# as a filename instead of the name specified in the encoding.
OPT_L = true
}
# Change default server to news.tilbu1.nb.nl.home.com, since the config
# is parsed in order this will be used from her on down
NNTPSERVER=news.tilbu1.nb.nl.home.com
alt.binaries.music.classical| \
alt.binaries.sounds.lossless.classical| \
alt.binaries.sounds.mp3.classical {
# Add news4.euro.net as a second server for
# alt.binaries.music.classical,
# alt.binaries.sounds.lossless.classical and
# alt.binaries.sounds.mp3.classical
NNTPSERVER+=|news4.euro.net
}
alt.binaries.music.classical| \
alt.binaries.sounds.lossless.classical| \
alt.binaries.sounds.mp3.classical {
OPT_L=true
OPT_I=(?i)( \
verdi| \
vivaldi| \
mozart| \
beethoven \
)
}
<== cut here ==>
Supported commandline options:
------------------------------
"-I", "--include" Set include pattern.
"-c", "--configfile" Specify a different config file. Default
.ripnewsrc
"-L", "--longname" Sets long filenames.
"-C", "--combinedname" Sets combined filenames.
"-X", "--exclude" Set exclude pattern.
"-T", "--test" Set test mode. Newsrc files will not be writen
to.
Supported config options:
-------------------------
OPT_I=<pattern> Set include pattern.
OPT_L=<bool> Set long filenames.
OPT_C=<bool> Sets combined filenames.
OPT_X=<pattern> Set exclude pattern. Ripnews will read articles matching this pattern but it will not attempt
to download them.
OPT_MR=<pattern> Set "mark read" pattern. Ripnews will place
articles matching this pattern in your newsrc,
afterwards they will never be present in memory
again. Great for reducing memory usage when
checking a group for the first time.
OPT_T=<bool> Set test mode. Newsrc files will not be written
to.
TEMPDIR=<dir> Set tempdir location.
NNTPSERVER=<server>[|server] Set NNTPSERVER names
CACHEDIR=<dir> Set cachedir location.
DATADIR=<dir> Set output dir location.
NEWSRCNAME=<newsrcbase> Specify newsrc basename. Server names
will be appended.
PERMISSION=<perm> Set permission bits for directory
creation. Standard unix style, eg. 0755.
EXTENSIONS=<pattern> Set extension include pattern.
OPT_M=<pattern> Set EXTENSIONS just for multi part messages.
OPT_S=<pattern> Set EXTENSIONS just for single part messages.
DELEXT=<pattern> Set extension "mark read" pattern.
OPT_MD=<pattern> Set DELEXT just for multi part messages.
OPT_SD=<pattern> Set DELEXT just for single part messages.
INCLUDEFILE=<file> Include another file, only works in main config.
Ruby patterns:
--------------
Ruby patterns are a lot like perl patterns, but there are some
differences. (?i) is the modifier to turn on case insensitivity, unlike
perl this modifier only works on the following block. Luckily you can
group multiple blocks into one by enclosing them with ()'s. So while
'OPT_I=(?i)foo|bar' would match 'foo' case insensitve and 'bar' case
sensitive 'OPT_I=(?i)(foo|bar)' will match both 'foo' and 'bar' case
insensitivly.
Other features:
===============
You can make a running ripnews process reread it's configuration by
sending it a SIGHUP.
Where can I find newsservers:
=============================
freenews.maxbaud.net
www.newzbot.com
www.gj.net/~bhkraft
Known bugs:
===========
There are no known bugs at this moment. If you find any, please let me
know. As with all my software, if it breaks you get to keep _both_
pieces.
Credits:
========
- Stijn Hoop for adding yEnc support
Contact info:
=============
New problems can be reported directly to me at <ward@wouts.nl>. Patches
welcome ;)
Ward Wouts

View file

@ -0,0 +1,31 @@
# $Dwarf: TODO,v 1.27 2005/01/28 20:06:45 ward Exp $
# $Source$
[ ] check for multiple servers (ip adresses) for each name and pick
one that works
[ ] support mime encoding
[ ] support base64
[ ] support quotedprintable
[ ] documentation
[ ] code cleanup
[ ] finish intspan
[ ] profiling/speed ups
[ ] improve error handling
[ ] use exceptions for error handling
[ ] check if xhdr implemented
[ ] write man page
[ ] more regression tests
[ ] update documentation
[ ] implement "skip current article" signal handle
[ ] add user/pass authentication
[ ] optionaly save parts of incomplete posts
[ ] with multipart articles, don't write every body to the same file.
This will mess up things if a get_body is repeated because of
exceptions. Use buffering for each body, before writing...
[ ] there is a bug in handling half fetched parts, they'll be fetched
twice this should be buffered until it's gotten correctly, then
added to the main buffer
[ ] match on poster
[ ] running without a tempdir doesn't work at all
[ ] don't drop connections to servers when switching groups
[ ] keep connections to newsservers alive (don't timeout)

View file

@ -0,0 +1 @@
1234567890

View file

@ -0,0 +1,4 @@
begin 644 testdata
+,3(S-#4V-S@Y,`K/
`
end

View file

@ -0,0 +1,3 @@
=ybegin line=128 size=11 name=testdata
[\]^_`abcZ4
=yend size=11 crc32=E2910DCA

View file

@ -0,0 +1,80 @@
#!/usr/local/bin/ruby
# $Dwarf: uu_test.rb,v 1.1 2003/04/20 16:33:02 ward Exp $
# $Source$
require '../uuencode.rb'
require 'ftools'
def test1
print "Test 1: decoding a file\n"
file = File.open("testdata.uu", "r")
tmpfile = Tempfile.new("uutmp")
tmpfile.sync=true
mode, filename, body = UUEncode.uudecode(file, tmpfile)
if mode != "644"
print " Failed, mode should be 644, but is #{mode}\n"
elsif filename != "testdata"
print " Failed, filename should be \"testdata\", but is \"#{filename}\"\n"
elsif ! File.compare("testdata", tmpfile.path)
print " Failed, result doesn't match reference data\n"
else
print " Succesful\n"
end
file.close
tmpfile.close
end
def test2
print "Test 2: decoding an array\n"
file = File.open("testdata.uu", "r")
lines = file.readlines
file.close
file = File.open("testdata", "r")
reference = file.readlines
file.close
mode, filename, body = UUEncode.uudecode(lines)
if mode != "644"
print " Failed, mode should be 644, but is #{mode}\n"
elsif filename != "testdata"
print " Failed, filename should be \"testdata\", but is \"#{filename}\"\n"
elsif reference != body
print " Failed, result doesn't match reference data\n"
else
print " Succesful\n"
end
end
def test3
print "Test 3: is_uuencoded\n"
file = File.open("testdata.uu", "r")
lines = file.readlines
file.close
if UUEncode.is_uuencoded(lines)
print " Succesful\n"
else
print " Failed\n"
end
end
def test4
print "Test 4: get_filename\n"
file = File.open("testdata.uu", "r")
lines = file.readlines
file.close
filename = UUEncode.get_filename(lines)
if filename == "testdata"
print " Succesful\n"
else
print " Failed\n"
end
end
test1
test2
test3
test4

View file

@ -0,0 +1,103 @@
#!/usr/local/bin/ruby
# $Dwarf: yenc_test.rb,v 1.1 2003/04/20 18:32:39 ward Exp $
# $Source$
require '../yenc.rb'
require 'ftools'
def test1
print "Test 1: decoding a file\n"
file = File.open("testdata.ync", "r")
tmpfile = Tempfile.new("ynctmp")
tmpfile.sync=true
mode, filename, body = YEnc.ydecode(file, tmpfile)
if filename != "testdata"
print "Failed, filename should be \"testdata\", but is \"#{filename}\"\n"
elsif ! File.compare("testdata", tmpfile.path)
print "Failed, result doesn't match reference data\n"
else
print "Succesful\n"
end
file.close
tmpfile.close
end
def test2
print "Test 2: decoding an array\n"
file = File.open("testdata.ync", "r")
lines = file.readlines
file.close
file = File.open("testdata", mode = "r")
reference = file.readlines
file.close
print " with dos linebreaks\n"
mode, filename, body = YEnc.ydecode(lines)
if filename != "testdata"
print "Failed, filename should be \"testdata\", but is \"#{filename}\"\n"
elsif reference != body
print "Failed, result doesn't match reference data\n"
else
print "Succesful\n"
end
lines.collect!{|x| x.chomp("\r\n")}
print " without linebreaks\n"
mode, filename, body = YEnc.ydecode(lines)
if filename != "testdata"
print "Failed, filename should be \"testdata\", but is \"#{filename}\"\n"
elsif reference != body
print "Failed, result doesn't match reference data\n"
else
print "Succesful\n"
end
lines.collect!{|x| x.sub(/$/, "\n")}
print " with unix linebreaks\n"
mode, filename, body = YEnc.ydecode(lines)
if filename != "testdata"
print "Failed, filename should be \"testdata\", but is \"#{filename}\"\n"
elsif reference != body
print "Failed, result doesn't match reference data\n"
else
print "Succesful\n"
end
end
def test3
print "Test 3: is_yencoded\n"
file = File.open("testdata.ync", "r")
lines = file.readlines
file.close
if YEnc.is_yencoded(lines)
print "Succesful\n"
else
print "Failed\n"
end
end
def test4
print "Test 4: get_filename\n"
file = File.open("testdata.ync", "r")
lines = file.readlines
file.close
filename = YEnc.get_filename(lines)
if filename == "testdata"
print "Succesful\n"
else
print "Failed\n"
end
end
test1
test2
test3
test4

View file

@ -0,0 +1,182 @@
# $Dwarf: uuencode.rb,v 1.6 2003/07/20 20:32:01 ward Exp $
# $Source$
#
# Copyright (c) 2002, 2003 Ward Wouts <ward@wouts.nl>
#
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
require 'tempfile'
class UUEncode
class << self
Debuglevel = 0
def uudecode(data, outfile=nil)
case data.class.to_s
when "Array"
print "Calling _uudecode_array\n" if Debuglevel>0
mode, filename, body = _uudecode_array(data)
when "File", "Tempfile"
unless outfile
print "uudecode: need outfile\n"
exit
end
print "Calling _uudecode_file\n" if Debuglevel>0
mode, filename, body = _uudecode_file(data, outfile)
else
print "Funny stuff in uudecode. Data of class \"#{data.class.to_s}\"\n"
end
return mode, filename, body
end
def _uudecode_file(file, outfile)
mode = 0600
filename = "unknown"
c = 0
lines = file.pos # horrible assumption FH is at end of file
percent = 0
mark = lines/100
file.pos = 0
while (! file.eof)
line = file.gets
print "line: #{line}" if Debuglevel > 0
if line =~ /^begin(.*)/
m = $1
print "beginning matched; rest: #{m}\n" if Debuglevel > 0
if m =~ /^(\s+(\d+))?(\s+(.*?\S))?\s*\Z/
mode = $2
filename = $4
print "found beginning\n" if Debuglevel > 0
else
print "mode, file set to defaults: #{m}\n"
end
break
end
end
if file.eof
print "Not UUencoded!\n"
return false
end
print "c: #{c} mark: #{mark} lines: #{lines}\n" if Debuglevel > 1
while (! file.eof)
if Debuglevel > 1
c = file.pos
if c > mark
print "#{percent}%\n"
print "c: #{c} mark: #{mark} lines: #{lines}\n" if Debuglevel > 1
percent += 1
mark = (lines/100)*(percent+1)
end
end
line = file.gets
print "line: #{line}" if Debuglevel > 1
return mode, filename if line =~ /^end/
next if line =~ /[a-z]/
next if line == nil
next unless ((((line[0] - 32) & 077) + 2) / 3).to_i == (line.length/4).to_i
outfile.print line.unpack("u")
end
print "No \"end\" found!!!\n"
#return mode, file, outfile
return false
end
# gaat volgens mij niet verder als er meerdere uuencoded blocks zijn...
# zal dan meerdere keren aangeroepen moeten worden, grmbl...
# tis getting a mess as we speak...
# toch maar een keer aparte class van maken...
def _uudecode_array(data)
decode = []
mode = 0600
filename = "unknown"
c = 0
lines = data.length
percent = 0
mark = lines/100
i = 0
while (i < data.length)
if data[i] =~ /^begin(.*)/
m = $1
print "beginning matched; rest: #{m}\n" if Debuglevel > 0
if m =~ /^(\s+(\d+))?(\s+(.*?\S))?\s*\Z/
mode = $2
filename = $4
print "found beginning\n" if Debuglevel > 0
else
print "mode, filename set to defaults: #{m}\n"
end
break
end
i += 1
end
unless (i < data.length)
print "Not UUencoded!\n"
return false
end
while (i < data.length)
if Debuglevel > 1
if c > mark
print "#{percent}%\n"
print "c: #{c} mark: #{mark} lines: #{lines} i: #{i}\n" if Debuglevel > 1
percent += 1
mark = (lines/100)*(percent+1)
end
c += 1
end
line = data[i]
i += 1
return mode, filename, decode if line =~ /^end/
next if line =~ /[a-z]/
next if line == nil
next unless ((((line[0] - 32) & 077) + 2) / 3).to_i == (line.length/4).to_i
unless line.unpack("u").eql?([""])
decode.concat(line.unpack("u"))
end
end
print "No \"end\" found!!!\n"
return false
end
def is_uuencoded(data)
if data.to_s =~ /begin\s+\d+?\s+.*?\S?\s*$/m
return true
else
return false
end
end
def get_filename(data)
i = 0
while i < data.length
line = data[i]
if line =~ /^begin(\s+(\d+))?(\s+(.*?\S))?\s*$/m
return $4
end
i += 1
end
return false
end
end # class
end

View file

@ -0,0 +1,318 @@
# $Dwarf: yenc.rb,v 1.14 2005/02/01 10:16:03 ward Exp $
# $Source$
#
# Copyright (c) 2002, 2003 Ward Wouts <ward@wouts.nl>
#
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
require 'tempfile'
class YencError < RuntimeError; end
class YEnc
class << self
Debuglevel = 0
@@ymap = {}
def ydecode(data, outfile=nil)
if @@ymap.empty?
(-106..255).each do |b|
@@ymap[b]=((b-42)%256)
end
end
case data.class.to_s
when "Array"
print "Calling _ydecode_array\n" if Debuglevel>0
mode, filename, body = _ydecode_array(data)
when "File", "Tempfile"
unless outfile
print "ydecode: need outfile\n"
exit
end
print "Calling _ydecode_file\n" if Debuglevel>0
mode, filename, body = _ydecode_file(data, outfile)
else
print "Funny stuff in ydecode. Data of class \"#{data.class.to_s}\"\n"
end
return mode, filename, body
end
def _ydecode_line(line)
i = 0
ll = line.length
ostr = ''
while i < ll
if line[i] == 0x3d
if i == (ll - 1)
raise YencError, "Escape char found as last char of line. This is not allowed by the yEnc standard"
else
i += 1
line[i] -= 64
end
end
# begin
ostr << @@ymap[line[i].to_i]
# rescue TypeError
# puts "this should not happen!!!"
# puts "line[i] contents: '#{line[i]}'\n"
# end
i += 1
end
return ostr
end
def _ydecode_file(file, outfile)
mode = 0600 # mode is a bit stupid with yencoding... it don't get it
filename = "unknown"
lines = file.pos
file.pos = 0
bytes = 0
total = 0
oldpartbegin = 0
oldpartend = 0
search_begin = false
skip = false
while (! file.eof)
line = file.gets
print "line: #{line}" if Debuglevel > 0
if line =~ /^\=ybegin\s+(.*line\=.*)/
m = $1
print "ybegin match; rest: #{m}\n" if Debuglevel > 0
if m =~ /^\s*(part\=(\d+)\s+)?(total\=(\d+)\s+)?(line\=(\d+))(\s*size\=(\d+))(\s*name=(.*?\S))\s*$/
part = $2.to_i
total = $4.to_i
linesize = $6.to_i
totalsize = $8.to_i
filename = $10
if Debuglevel > 0
print "found beginning"
if part != nil
print " of part #{part}"
end
if total != nil
print " of #{total}"
end
print ", linesize = #{linesize}, size = #{totalsize}, filename = #{filename}\n"
end
break
else
print "not a valid yenc begin line\n"
end
end
end
if file.eof
print "Not yencoded!\n"
return false
end
while (! file.eof)
print "at #{file.pos} need to go to #{lines}\n" if Debuglevel > 1
line = file.gets
line.chop!
if line =~ /^=yend\s+(.*)\Z/
m = $1
m =~ /(\s*size=(\d+)\s+)(\s*part=(\d+))?(\s+crc32=(\S+))?/
size = $2.to_i
part = $4.to_i
crc = $6
if size != bytes
print "#{Thread.current.inspect} part size mismatch, is #{bytes}, should be #{size}\n"
end
if part == nil
return mode, filename
end
total += bytes
if total >= totalsize
if total != totalsize
print "#{Thread.current.inspect} total size mismatch, is #{total}, should be #{totalsize}\n"
end
return mode, filename
end
search_begin = true
bytes = 0
next
end
if search_begin && line =~ /^\=ybegin\s+(.*)\Z/
m = $1
search_begin = false
if m =~ /^\s*(part\=(\d+)\s+)?(total\=(\d+)\s+)?(line\=(\d+))(\s*size\=(\d+))(\s*name=(.*?\S))\s*$/
part = $2.to_i
total = $4.to_i
linesize = $6.to_i
totalsize = $8.to_i
filename = $10
print "found beginning of part #{part}, linesize = #{linesize}, size = #{totalsize}, filename = #{filename}\n" if Debuglevel > 0
end
next
end
if search_begin == true
next
end
if line =~ /^=ypart\s+(\s*begin=(\d+))(\s+end=(\d+))/
skip = false
b = $2
e = $4
print " #{Thread.current.inspect} next part begin #{b}, end #{e}\n"
if b.to_i == oldpartbegin && e.to_i == oldpartend
print "Skipping duplicate part\n"
skip = true
next
end
if b.to_i == oldpartend + 1
oldpartend = e.to_i
oldpartbegin = b.to_i
else
raise PermError, "#{Thread.current.inspect} Parts not continuous! last end #{oldpartend}, begin #{b}"
end
next
end
# This seems to be a common 'error' - maybe I misunderstand the spec or
# something
# if line.length != linesize
# print "linesize mismatch, was #{line.length}, should be #{linesize}...\n"
# end
if !skip
print "line: #{line}" if Debuglevel > 0
ostr = _ydecode_line(line)
outfile << ostr
bytes += ostr.length
end
end
print "No \"=yend\" found!!!\n"
return mode, filename, outfile
end
def _ydecode_array(data)
decode = ""
mode = 0600
filename = "unknown"
oldpartend = 0
oldpartbegin = 0
c = 0
lines = data.length
bytes = 0
percent = 0
mark = lines/100
i = 0
while (i < data.length)
if data[i] =~ /^\=ybegin\s+(.*line\=.*)/
m = $1
print "ybegin match; rest: #{m}\n" if Debuglevel > 0
if m =~ /^\s*(part\=(\d+)\s+)?(total\=(\d+)\s+)?(line\=(\d+))(\s*size\=(\d+))(\s*name=(.*?\S))\s*$/
part = $2.to_i
total = $4.to_i
linesize = $6.to_i
size = $8.to_i
filename = $10
print "found beginning, linesize = #{linesize}, size = #{size}, filename = #{filename}\n" if Debuglevel > 0
i += 1
break
else
print "not a valid yenc begin line\n"
end
end
i += 1
end
unless (i < data.length)
print "Not yencoded!\n"
return false
end
while (i < data.length)
line = data[i]
line.chomp!("\n")
line.chomp!("\r")
print "at #{i} need to go to #{data.length}\n" if Debuglevel > 1
print "line: #{line}" if Debuglevel > 0
i += 1
if line =~ /^\=yend(\s+size=(\d+))(\s+crc32=(\S+))?/
size = $2.to_i
crc = $4
if size != decode.length
print "#{Thread.current.inspect} size mismatch, was #{decode.length}, should be #{size}\n"
end
dec = [ decode ]
return mode, filename, dec
end
if line =~ /^=ypart\s+(\s*begin=(\d+))(\s+end=(\d+))/
skip = false
b = $2
e = $4
print " #{Thread.current.inspect} next part begin #{b}, end #{e}\n"
if b.to_i == oldpartbegin && e.to_i == oldpartend
print "Skipping duplicate part\n"
skip = true
next
end
if b.to_i == oldpartend + 1
oldpartend = e.to_i
oldpartbegin = b.to_i
else
raise PermError, "#{Thread.current.inspect} Parts not continuous! last end #{oldpartend}, begin #{b}"
end
next
end
# This seems to be a common 'error' - maybe I misunderstand the spec or
# something
# if line.length != linesize
# print "#{i}: linesize mismatch, was #{line.length}, should be #{linesize}...\n"
# end
if !skip
print "line: #{line}" if Debuglevel > 0
ostr = _ydecode_line(line)
decode << ostr
bytes += ostr.length
end
end
print "${i}: no \"=yend\" found!!!\n"
dec = [ decode ]
return mode, filename, dec
end
def is_yencoded(data)
if data.to_s =~ /=ybegin/m
return true
else
return false
end
end
def get_filename(data)
i = 0
while i < data.length
line = data[i]
if line =~ /=ybegin\s*(part\=(\d+)\s+)?(total\=(\d+)\s+)?(line\=(\d+))(\s*size\=(\d+))(\s*name=(.*?\S))\s*$/m
return $10
end
i += 1
end
return false
end
end # class
end

View file

@ -0,0 +1,321 @@
################################
#
# nntp.rb - an NNTP client implementing RFC 977
# ported from the Python code by Jefferson Heard
# this software is released under the terms of the GNU Library General Public License
# (C) 2001, Jefferson Heard
#
# Contributors: Jefferson Heard, Ward Wouts
#
# Release History
# 0.1: 11.7.2001 - Initial revision.
# 0.2: 11-9-2001 - fixed regexp bugs,
# fixed XHDR bugs,
# made internal methods private,
# changed constructor default arg
# 0.3: 11-14-2001 - Fixed numerous bugs and made things a little cleaner
# as per the suggestions of Ward Wouts
# 0.4: 11-15-2001 - Fixed statcmd bug - Ward Wouts
# 0.5: 12-06-2001 - Fixed post buf - Ozawa, Sakuro
#################################
require 'socket'
require 'net/protocol'
module Net
# Exceptions raised by NNTP
class NNTPError < RuntimeError; end
class NNTPReplyError < NNTPError; end
class NNTPTemporaryError < NNTPError; end
class NNTPPermanentError < NNTPError; end
class NNTPDataError < NNTPError; end
class NNTP
NNTP_PORT = 119
LONGRESP = ['100', '215', '220', '221', '222', '224', '230', '231', '282']
CRLF = "\r\n"
def initialize(host, port=NNTP_PORT, user=nil, password=nil, readermode=nil)
@debuglevel = 0
@host = host
if port then @port = port else @port = NNTP_PORT end
@socket = TCPSocket.new @host, @port
@welcome = getresp
readermode_afterauth = false
if readermode
begin
@welcome = shortcmd('mode reader')
rescue NNTPPermanentError
rescue NNTPTemporaryError
if user and $!.response[0...3] == '480'
readermode_afterauth = true
else
raise
end
end
end
if user
resp = shortcmd "authinfo user #{user}"
if resp[0...3] == '381' # then we need a password
raise NNTPReplyError, resp, caller unless password
resp = shortcmd "authinfo pass #{password}"
raise NNTPPermanentError, resp, caller unless resp[0...3] == '281'
end
end
if readermode_afterauth
begin
@welcome = shortcmd('mode reader')
rescue NNTPPermanentError
end
end
end
def welcome
puts "*welcome*, #{@welcome}" if @debuglevel > 0
return @welcome
end
attr_writer :debuglevel
def putline(line)
puts '*put* '+line+'\r\n' if @debuglevel > 1
@socket.send "#{line}\r\n", 0
end
def putcmd(cmd)
puts "*cmd* #{cmd}" if @debuglevel > 0
putline cmd
end
def getline
line = @socket.readline
print "*getline* '#{line}'" if @debuglevel > 0
line.chomp!("\r\n")
return line
end
def getresp
resp = getline
puts "*getresp* #{resp}" if @debuglevel > 0
c = resp[0]
case c
when c == '4' then raise NNTPTemporaryError, resp, caller
when c == '5' then raise NNTPPermanentError, resp, caller
when '123'.include?(c) then raise NNTPProtocolError, resp, caller
end
return resp
end
def getlongresp
resp = getresp
raise NNTPReplyError, resp, caller unless LONGRESP.include? resp[0...3]
list = []
while true
line = getline
break if line == '.'
line.slice!(0) if line.to_s[0...2] == '..'
list << line
end
return resp, list
end
def shortcmd(line)
putcmd line
return getresp
end
def longcmd(line)
putcmd line
return getlongresp
end
def newgroups(date, time)
return longcmd("NEWGROUPS #{date.to_s} #{time.to_s}")
end
def newnews(group, date, time)
return longcmd("NEWNEWS #{group} #{date.to_s} #{time.to_s}")
end
def list
resp, list = longcmd "LIST"
list.each_index {|ix|
list[ix] = list[ix].split " "
}
return resp, list
end
def group(name)
resp = shortcmd "GROUP #{name}"
raise NNTPReplyError, resp, caller unless resp[0...3] == '211'
words = resp.split " "
count, first, last = 0
n = words.length
if n>1
count = words[1]
if n>2
first = words[2]
if n>3
last = words[3]
if n>4
name = words[4].downcase
end
end
end
end
return resp, count, first, last, name
end
def help
return longcmd("HELP")
end
def statparse(resp)
raise NNTPReplyError, resp, caller unless resp[0...2] == '22'
words = resp.split " "
nr = 0
id = ''
n = words.length
if n>1
nr = words[1]
if n>2
id = words[2]
end
end
return resp, nr, id
end
def statcmd(line)
resp = shortcmd line
return statparse(resp)
end
def stat(id)
return statcmd("STAT #{id}")
end
def next
return statcmd("NEXT")
end
def last
return statcmd("LAST")
end
def articlecmd(line)
resp, list = longcmd line
resp, nr, id = statparse(resp)
return resp, nr, id, list
end
def head(id)
return articlecmd("HEAD #{id}")
end
def body(id)
return articlecmd("BODY #{id}")
end
def article(id)
return articlecmd("ARTICLE #{id}")
end
def slave(id)
return shortcmd("SLAVE")
end
def mode_reader()
return shortcmd("MODE READER")
end
def xhdr(hdr, str)
pat = Regexp.new '^([0-9]+) ?(.*)\n?'
resp, lines = longcmd "XHDR #{hdr} #{str}"
lines.each_index {|ix|
line = lines[ix]
m = pat.match line
lines[ix] = m[1..2] if m
}
return resp, lines
end
def xover(start, ed)
begin
resp, lines = longcmd "XOVER #{start}-#{ed}"
xover_lines = []
lines.each {|line|
elements = line.split("\t")
elements[5] = elements[5].split(" ")
xover_lines << elements
}
return resp, xover_lines
rescue RuntimeError
raise(NNTPDataError, lines, caller)
end
end
def xgtitle(group)
line_pat = Regexp.new "^([^\t]+)[\t]+(.*)$"
resp, raw_lines = longcmd "XGTITLE #{group}"
lines = []
raw_lines.each {|line|
match = line_pat.match line.strip
lines << match[1..2] if match
}
return resp, lines
end
def date
resp = shortcmd "DATE"
raise NNTPReplyError unless resp[0...3] == '111'
resp.split! " "
raise NNTPDataError unless resp.length == 2
date = resp[1][2...8]
time = resp[1][-6..-1]
raise(NNTPDataError, resp, caller) unless date.length == 6 and time.length == 6
return resp, date, time
end
def post(f)
resp = shortcmd "POST"
raise NNTPReplyError unless resp =~ /^3/ #[0] == 3
lines = f.readlines
lines.each {|line|
line.chop!
line = '.' + line if line[0] == '.'
putline line
}
putline '.'
return getresp
end
def quit
resp = shortcmd "QUIT"
@socket.close_read
@socket.close_write
return resp
end
private :statparse, :getline, :putline, :articlecmd, :statcmd
protected :getresp, :getlongresp
end
end
if __FILE__ == $0
s = Net::NNTP.new('news')
resp, count, first, last, name = s.group('comp.lang.ruby')
puts resp
puts "group #{name} has #{count} articles, range #{first} to #{last}"
resp, subs = s.xhdr('subject', "#{first}-#{last}")
puts resp
subs.each do |sub| puts sub end
resp = s.quit
puts resp
end

View file

@ -0,0 +1,872 @@
# $Dwarf: article.rb,v 1.108 2005/02/06 13:42:03 ward Exp $
# $Source$
#
# Copyright (c) 2002, 2003, 2004, 2005 Ward Wouts <ward@wouts.nl>
#
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
require 'set/intspan'
require 'net/nntp'
require 'news/newsrc'
require 'tempfile'
require 'timeout'
#require 'yaml'
class ArticleError < RuntimeError; end
class TempError < ArticleError; end
class PermError < ArticleError; end
class Article
Debuglevel = 1
Message = Struct.new(:messid, :id, :server, :subject)
def initialize(nntpservers, groupname, newsrc="~/.newsrc")
@messageinfo = []
@grouped = false
@groups = {}
@gotten = {}
@group = groupname
@preselectpattern = Regexp.new('^')
@cache_buf = {}
@serverlist = nntpservers.split('|')
@connections = {}
@serverlist.collect{|server|
@connections[server] = {}
@cache_buf[server] = []
begin
p server
p Time.now
begin
timeout(60) do
@connections[server]["nntp"] = Net::NNTP.new(server)
end
resp = @connections[server]["nntp"].mode_reader
rescue TimeoutError
sleep 3
retry
end
p Time.now
@connections[server]["skip_ids"] = Set::IntSpan.new()
@connections[server]["newsrc"] = News::Newsrc.new("#{newsrc}.#{server}")
set_skip_ids(server, @connections[server]["newsrc"].marked_articles(@group))
rescue SocketError, Errno::EINVAL, EOFError, Errno::ETIMEDOUT
print "Connection to #{server} failed: #{$!}\n"
del_server(server)
end
}
end
def reconnect(server)
retries = 0
begin
@connections[server]["nntp"].quit
# helpt dit in geheugen gebruik? : Volgens mij niet
#@connections[server].delete("nntp")
#GC.start
rescue Errno::EPIPE, Errno::ECONNRESET, EOFError
end
begin
sleep 3
#timeout(180) do
timeout(60) do
@connections[server]["nntp"] = Net::NNTP.new(server)
end
resp = @connections[server]["nntp"].mode_reader
rescue SocketError, Errno::EINVAL, EOFError, Errno::ETIMEDOUT, TimeoutError
print "Reconnect to #{server} failed: #{$!}\n"
if retries > 1
del_server(server)
raise PermError, "Couldn't connect to #{server}"
else
retries += 1
retry
end
end
print "Succesfully reconnected to #{server}\n"
end
def memusage
print "memprof:\n"
print "global:\n"
# for i in global_variables
# print "#{i}\n"
# end
# print "local:\n"
# for i in local_variables
# print "#{i}\n"
# end
for i in self.instance_variables
puts i
print "X: "
begin
puts self.instance_eval(i).size
rescue NoMethodError
end
end
end
def set_preselect_pattern(regexp)
@preselectpattern = Regexp.new(regexp)
end
def preselect(subject)
if subject =~ @preselectpattern
return true
else
return false
end
# return ( subject =~ @preselectpattern )
end
def add(id, messid, subject, server)
@messageinfo.push(Message.new(messid, id.to_i, server, subject))
@grouped = false
end
def del_server(server)
print "Removing server #{server} from list\n"
@connections.delete(server)
@serverlist.delete(server)
end
def get_articles(cachedir=false)
if cachedir != false
cache_check(cachedir)
end
for server in @connections.keys
begin
first, last = get_group_info(server)
rescue PermError
print "Error: #{$!}\n"
del_server(server)
next
end
if first.to_i <= last.to_i
# available articles on server
@connections[server]["first"] = first ? first.to_i : 0
@connections[server]["last"] = last ? last.to_i : 0
if Debuglevel > 0
print " Server: #{server}\n"
print " First: #{first}\n"
print " Last: #{last}\n"
end
# clean up old newsrc entries
if @connections[server]["first"] > 0
@connections[server]["newsrc"].unmark_range(@group, 0, (@connections[server]["first"] - 1).to_s)
@connections[server]["newsrc"].save
end
else
print " First article has higher number than last article on server #{server}.\n"
del_server(server)
end
end
cache_read(cachedir)
# spul dat echt te oud is gaat nooit gevuld worden, dus doe ook geen poging het op te halen
# wil wel wat ophalen aangezien logging aantoont dat er wel oudere articles gedownload worden
for server in @connections.keys
if @connections[server]["skip_ids"].max && @connections[server]["skip_ids"].max
articles = @connections[server]["last"] - @connections[server]["first"]
if articles > 10000
fillerend = (@connections[server]["skip_ids"].max - (articles/10)).to_i
else
fillerend = @connections[server]["skip_ids"].max - 1000
end
if @connections[server]["skip_ids"].min && fillerend > @connections[server]["skip_ids"].min
@connections[server]["skip_ids"] = @connections[server]["skip_ids"].union("#{@connections[server]["skip_ids"].min}-#{fillerend}")
# p "filling #{@connections[server]["skip_ids"].min}-#{fillerend}"
end
end
end
for server in @connections.keys
print " reading articles from server: #{server}\n"
range = Set::IntSpan.new("#{@connections[server]["first"]}-#{@connections[server]["last"]}")
rangelist = rechunk_runlist(range.diff(@connections[server]["skip_ids"]).run_list)
print "rangelist: #{rangelist}\n" if Debuglevel > 2
print "rangelist: #{rangelist.class.to_s}\n" if Debuglevel > 2
print "rangelist elements: #{range.diff(@connections[server]["skip_ids"]).elements}\n" if Debuglevel > 2
begin
unless rangelist == nil or rangelist =~ /^$/
headerlines = 0
for i in rangelist.split(',')
print "i: #{i}\n" if Debuglevel > 2
begin
# resp, xover_lines = get_xover(server, i)
resp, subj_lines = get_xhdr(server, i, "subject")
resp, messid_lines = get_xhdr(server, i, "message-id")
rescue TempError
printerr(server)
next
end
art = {}
# xover_lines.collect{|x|
# art[x[0]] = {} unless art.has_key?(x[0])
# art[x[0]]["subject"] = x[1]
# art[x[0]]["messid"] = x[4]
# print "art id: #{x[0]} subj: #{x[1]}\n" if Debuglevel > 2
# print "art id: #{x[0]} messid: #{x[4]}\n" if Debuglevel > 2
# }
subj_lines.collect{|x|
art[x[0]] = {} unless art.has_key?(x[0])
art[x[0]]["subject"] = x[1]
print "art id: #{x[0]} subj: #{x[1]}\n" if Debuglevel > 2
}
messid_lines.collect{|x|
art[x[0]] = {} unless art.has_key?(x[0])
art[x[0]]["messid"] = x[1]
print "art id: #{x[0]} messid: #{x[1]}\n" if Debuglevel > 2
}
for id in art.keys
if art[id].has_key?("subject") and art[id].has_key?("messid")
print "adding: #{art[id]["messid"]}, #{id}, #{server}, #{art[id]["subject"]}\n" if Debuglevel > 2
# @newids[server][id.to_i] = true
# dit wellicht alleen doen indien preselector hem uitkiest
# en anders een leuk regeltje aan de cache toevoegen,
# maar niet in het geheugen houden
if preselect(art[id]["subject"])
add(id, art[id]["messid"], art[id]["subject"], server)
end
cache_add(cachedir, id, art[id]["messid"], art[id]["subject"], server)
end
end
# headerlines += xover_lines.length
headerlines += subj_lines.length
if headerlines >= 500
cache_save(cachedir, server)
headerlines = 0
end
end
end
rescue PermError
del_server(server)
next
end
cache_save(cachedir, server)
end
GC.start
end
def get_group_info(server)
timedout = 0
errs = 0
resp = ""
first = ""
last = ""
begin
timeout(30) do
begin
resp, count, first, last, name = @connections[server]["nntp"].group(@group)
rescue Net::NNTPReplyError
printerr(server)
if ( $!.to_s =~ /^503|^400/ )
reconnect(server)
retry
else
raise PermError, "#{$!}"
end
rescue Errno::EPIPE, Errno::ECONNRESET, Errno::ETIMEDOUT, EOFError, Errno::EINVAL
printerr(server)
raise PermError, "Too many errors! (get_group_info)" if errs > 3
reconnect(server)
retry
end
end
rescue TimeoutError
timedout += 1
raise PermError, "Too many timeouts! (get_group_info)" if timedout > 1
print "Time out, reconnecting to server... (get_group_info)\n"
reconnect(server)
retry
end
return first, last
end
def get_xhdr(server, range, header)
timedout = 0
resp = ""
lines = []
begin
timeout(180) do
begin
p Time.now if Debuglevel > 1
print "getting headers: #{header}, #{range}\n" if Debuglevel > 1
resp, lines = @connections[server]["nntp"].xhdr(header, range)
if resp.to_i == 500
print "xhdr not implemented\n"
print "Error: #{$!}\n"
end
unless resp.to_i >= 200 and resp.to_i < 300
print "got response #{resp} while reading group #{@group} from #{server}\n"
raise TempError
end
rescue Net::NNTPReplyError
printerr(server)
if ( $!.to_s =~ /^503|^400/ )
reconnect(server)
get_group_info(server)
retry
else
print "Won't handle this... yet :(\n"
end
rescue Errno::EPIPE, Errno::ECONNRESET, EOFError
printerr(server)
reconnect(server)
get_group_info(server)
retry
end
end
return resp, lines
rescue TimeoutError
print "Time out, reconnecting to server (get_xhdr)\n"
timedout += 1
raise PermError, "Too many timeouts! (get_xhdr)" if timedout > 1
reconnect(server)
get_group_info(server)
retry
end
end
def get_xover(server, range)
timedout = 0
resp = ""
lines = []
start, ed = range.split("-")
unless ed
ed = start
end
begin
timeout(180) do
begin
p Time.now if Debuglevel > 1
print "getting headers: #{range}\n" if Debuglevel > 1
resp, lines = @connections[server]["nntp"].xover(start, ed)
if resp.to_i == 500
print "xover not implemented\n"
print "Error: #{$!}\n"
end
unless resp.to_i >= 200 and resp.to_i < 300
print "got response #{resp} while reading group #{@group} from #{server}\n"
raise TempError
end
rescue Net::NNTPReplyError
printerr(server)
if ( $!.to_s =~ /^503|^400/ )
reconnect(server)
get_group_info(server)
retry
else
print "Won't handle this... yet :(\n"
end
rescue Errno::EPIPE, Errno::ECONNRESET, EOFError
printerr(server)
reconnect(server)
get_group_info(server)
retry
end
end
return resp, lines
rescue TimeoutError
print "Time out, reconnecting to server (get_xover)\n"
timedout += 1
raise PermError, "Too many timeouts! (get_xover)" if timedout > 1
reconnect(server)
get_group_info(server)
retry
end
end
def get_groupname
return @group
end
def get_body(server, message)
#p "get_body"
timedout = 0
retries = 0
resp = ""
id = ""
messid = ""
list = []
begin
timeout(180) do
begin
list = []
resp, id, messid, list = @connections[server]["nntp"].body(message)
rescue Net::NNTPReplyError
a = ''
a += $!
printerr(server)
if retries == 0 && (a =~ /^503/ || a =~ /^400/)
reconnect(server)
get_group_info(server)
retries = 1
retry
end
return false
rescue EOFError, NameError
printerr(server)
return false
rescue Errno::EPIPE, Errno::ECONNRESET
printerr(server)
reconnect(server)
get_group_info(server)
retry
end
end
return resp, id, messid, list
rescue TimeoutError, Errno::ETIMEDOUT
print "Time out, reconnecting to server (get_body)\n"
timedout += 1
raise PermError, "Too many timeouts! (get_body)" if timedout > 1
reconnect(server)
get_group_info(server)
retry
end
end
def get_group_body(subj)
#p "get_group_body"
result = []
group_subject_sort(subj)
# puts @groups[subj].to_yaml
return false if @groups[subj]["messageinfo"] == nil
for i in (0...@groups[subj]["messageinfo"].length)
unless @gotten.has_key?(@groups[subj]["messageinfo"][i][:messid])
print "getting article: #{i}\n" if Debuglevel > 1
print "getting article: #{subj}\n" if Debuglevel > 1
print "full subject: #{@groups[subj]["messageinfo"][i][:subject]}\n" if Debuglevel > 0
print "message id: #{@groups[subj]["messageinfo"][i][:messid]}\n" if Debuglevel > 1
print "id: #{@groups[subj]["messageinfo"][i][:id]}\n" if Debuglevel > 1
print "server: #{@groups[subj]["messageinfo"][i][:server]}\n" if Debuglevel > 0
resp = false
while resp == false
if @serverlist.include?(@groups[subj]["messageinfo"][i][:server])
resp, id, messid, list = get_body(@groups[subj]["messageinfo"][i][:server], @groups[subj]["messageinfo"][i][:messid])
else
resp = false
end
if resp == false
if Debuglevel > 1
print "mess-id i: #{@groups[subj]["messageinfo"][i][:messid]}\n"
# XXX dit moet netter kunnen
print "mess-id i+1: #{@groups[subj]["messageinfo"][i+1][:messid]}\n" if @groups[subj]["messageinfo"][i+1] != nil
end
if (i+1 < @groups[subj]["messageinfo"].length) and
(@groups[subj]["messageinfo"][i][:messid] == @groups[subj]["messageinfo"][i+1][:messid])
print " Trying next server...\n"
i += 1
else
raise TempError, " Message-id not on another server"
end
end
end
@gotten[ @groups[subj]["messageinfo"][i][:messid] ] = true
result = list
end
end
return result
end
def get_group_body_first(subj)
#p "get_group_body_first"
group_subject_sort(subj)
i = 0
unless @groups[subj]["messageinfo"] != nil && @groups[subj]["messageinfo"][0][:messid]
p "ieks komt niet door lame check heen"
return false
end
p "komt wel door lame check heen"
while @gotten.has_key?(@groups[subj]["messageinfo"][0][:messid]) == false
print "getting article: #{subj}\n" if Debuglevel > 0
print "full subject: #{@groups[subj]['messageinfo'][0][:subject]}\n" if Debuglevel > 0
print "message id: #{@groups[subj]['messageinfo'][i][:messid]}\n" if Debuglevel > 1
print "id: #{@groups[subj]['messageinfo'][i][:id]}\n" if Debuglevel > 1
print "server: #{@groups[subj]['messageinfo'][0][:server]}\n" if Debuglevel > 0
resp = false
while resp == false
resp, id, messid, list = get_body(@groups[subj]["messageinfo"][i][:server], @groups[subj]["messageinfo"][i][:messid])
if resp == false
print "mess-id i: #{@groups[subj]['messageinfo'][i][:messid]}\n"
# XXX dit moet netter kunnen
print "mess-id i+1: #{@groups[subj]['messageinfo'][i+1][:messid]}\n" if @groups[subj]["messageinfo"][i+1] != nil
if (i+1 < @groups[subj]["messageinfo"].length) and
(@groups[subj]["messageinfo"][i][:messid] == @groups[subj]["messageinfo"][i+1][:messid])
print "Trying next server...\n"
i += 1
else
raise TempError, "Message-id not on another server"
end
end
end
@gotten[@groups[subj]["messageinfo"][i][:messid]] = true
end
return list
end
def get_group_body_rest(subj, file=nil)
#p "get_group_body_rest"
result = []
for i in (1...@groups[subj]["messageinfo"].length)
unless @gotten.has_key?(@groups[subj]["messageinfo"][i][:messid])
print "getting article: #{i}\n" if Debuglevel > 1
print "getting article: #{subj}\n" if Debuglevel > 1
print "full subject: #{@groups[subj]['messageinfo'][i][:subject]}\n" if Debuglevel > 0
print "message id: #{@groups[subj]['messageinfo'][i][:messid]}\n" if Debuglevel > 1
print "id: #{@groups[subj]['messageinfo'][i][:id]}\n" if Debuglevel > 1
print "server: #{@groups[subj]['messageinfo'][i][:server]}\n" if Debuglevel > 0
resp = false
while resp == false
resp, id, messid, list = get_body(@groups[subj]["messageinfo"][i][:server], @groups[subj]["messageinfo"][i][:messid])
if resp == false
print "mess-id i: #{@groups[subj]["messageinfo"][i][:messid]}\n"
# print "mess-id i+1: #{@groups[subj]["messageinfo"][i+1][:messid]}\n"
# XXX dit moet netter kunnen
print "mess-id i+1: #{@groups[subj]["messageinfo"][i+1][:messid]}\n" if @groups[subj]["messageinfo"][i+1] != nil
if (i+1 < @groups[subj]["messageinfo"].length) and
(@groups[subj]["messageinfo"][i][:messid] == @groups[subj]["messageinfo"][i+1][:messid])
print "Trying next server...\n"
i += 1
else
raise TempError, "Message-id not on another server"
end
end
end
@gotten[ @groups[subj]["messageinfo"][i][:messid] ] = true
if file
list.collect{|line| file.print "#{line}\n"}
else
result.concat(list)
end
end
end
return result
end
def get_group_subjects
group_subjects unless @grouped
return @groups.keys
end
def group_is_complete(subj)
group_subjects unless @grouped
#print "Subject: #{subj}\n"
print "length: #{@groups[subj]["messageinfo"].length} total: #{@groups[subj]["total"].to_i}\n" if Debuglevel > 1
messids = []
@groups[subj]["messageinfo"].each {|x|
messids.push(x[:messid])
}
#p "group complete?: #{messids}"
umessids = messids.uniq
if (umessids.length ) >= @groups[subj]["total"].to_i
return true
else
return false
end
end
def group_is_singlepart(subj)
@groups[subj]["total"].to_i == 1
end
def group_is_multipart(subj)
@groups[subj]["total"].to_i > 1
end
def group_subjects
@groups = {}
for i in (0...@messageinfo.length)
print "group subjects: #{i} #{@messageinfo[i][:subject]}\n" if Debuglevel > 3
if @messageinfo[i][:subject] =~ /(.*)\((\d+)\/(\d+)\)(.*)/ || @messageinfo[i][:subject] =~ /(.*)\[(\d+)\/(\d+)\](.*)/
j = "#{$1}#{$4} (#{$3})"
number = $2
total = $3
else
j = @messageinfo[i][:subject]
number = 1
total = 1
end
if @groups.has_key?(j) and number.to_i != 0
@groups[j]["messageinfo"].push(@messageinfo[i])
elsif number.to_i != 0
@groups[j] = {}
@groups[j]["total"] = total
@groups[j]["messageinfo"] = [ (@messageinfo[i]) ]
end
end
@grouped = true
end
def set_skip_ids(server, ids)
set = Set::IntSpan.new(ids)
set.finite or return false
min = set.min
min != nil and min < 0 and return false
@connections[server]["skip_ids"] = set
return true
end
def group_update_newsrc(subject)
print "running group_update_newsrc\n";
for i in (0...@groups[subject]["messageinfo"].length)
if @connections[@groups[subject]["messageinfo"][i][:server]]
@connections[@groups[subject]["messageinfo"][i][:server]]["newsrc"].mark(@group, @groups[subject]["messageinfo"][i][:id])
end
end
end
def save_newsrc()
for server in @connections.keys
@connections[server]["newsrc"].save
end
end
def cache_add(cachedir, id, messid, subject, server)
if @cache_buf.has_key?(server)
@cache_buf[server].push("#{id}|#{messid}|#{subject}\n")
else
@cache_buf[server] = [ "#{id}|#{messid}|#{subject}\n" ]
end
if @cache_buf[server].length > 100
cache_save(cachedir, server)
end
end
def cache_check(cachedir)
if ! FileTest.exists?(cachedir)
print "Cachedir '#{cachedir}' doesn't exists, performance will suffer\n"
end
end
def cache_read(cachedir)
p "reading cache"
p Time.now
filename = "#{cachedir}/#{@group}.ripnewscache"
excludes = {}
for server in @connections.keys
cache_scrub(cachedir, server)
excludes[server] = {}
@connections[server]["skip_ids"].elements.collect!{|x| excludes[server][x]=true}
if FileTest.directory?( cachedir) and FileTest.file?( "#{filename}.#{server}" ) and FileTest.readable?( "#{filename}.#{server}" )
File.new( "#{filename}.#{server}" ).each{ |line|
id, messid, subject = line.split("|", 3)
unless excludes.has_key?(server) and excludes[server].has_key?(id.to_i) or
id.to_i < @connections[server]["first"] or
id.to_i > @connections[server]["last"]
if preselect(subject)
add(id, messid, subject, server)
end
@connections[server]["skip_ids"].insert(id.to_i)
end
}
end
end
p Time.now
#memusage
end
def cache_save(cachedir, server)
#p "writing cache"
#p Time.now
filename = "#{cachedir}/#{@group}.ripnewscache"
if FileTest.directory?( cachedir )
file = File.new( "#{filename}.#{server}", "a+" ) or print "couldn't open cachefile for writing\n"
# print "Updating cache...\n"
@cache_buf[server].sort!
file.print @cache_buf[server]
file.close
@cache_buf[server] = []
# print "Cache updated for #{server}\n"
end
#p Time.now
end
def cache_scrub(cachedir, server)
# XXX this could and probably should be done in a separate thread...
# XXX but it'll work for now
# XXX also read articles aren't removed right now
# XXX this could be done, but I don't know if I want to pay the overhead
p "scrubbing cache"
p Time.now
filename = "#{cachedir}/#{@group}.ripnewscache"
if File.exists?("#{filename}.#{server}")
regexp = Regexp.new('^(\d+)\|')
infile = File.new("#{filename}.#{server}") or puts "Couldn't open cachefile for reading"
outfile = File.new("#{filename}.#{server}.new", "w") or puts "Couldn't open cachefile for writing"
infile.each{ |line|
if line =~ regexp
if $1.to_i >= @connections[server]["first"] and
$1.to_i <= @connections[server]["last"]
outfile.puts(line)
end
end
}
if ( File.move("#{filename}.#{server}.new", "#{filename}.#{server}") )
print "Cache scrubbed for #{server}\n"
else
print "Couldn't scrub #{server} cache\n"
end
end
p Time.now
end
###############################################################
# a base64 decoder...
def decode64(str)
string = ''
for line in str.split("\n")
line.delete!('^A-Za-z0-9+') # remove non-base64 chars
line.tr!('A-Za-z0-9+', ' -_') # convert to uuencoded format
len = ["#{32 + line.length * 3 / 4}"].pack("c")
# compute length byte
string += "#{len}#{line}".unpack("u") # uudecode and concatenate
end
return string
end
###############################################################
def group_subject_sort(subj)
# XXX Waarom gebruik ik hier eigenlijk sort_arr ipv in place sorting?
#print "Sorting articles\n"
serverhash = {}
for i in (0...@serverlist.length)
serverhash[@serverlist[i]] = i
end
total = @groups[subj]["total"]
sort_arr = []
#p "pre sort length: #{@groups[subj]['messageinfo'].length}"
for i in (0...@groups[subj]["messageinfo"].length)
print "subj sort #{@groups[subj]['messageinfo'][i][:subject]}\n" if Debuglevel > 2
print "subj sort #{@groups[subj]['messageinfo'][i][:messid]}\n" if Debuglevel > 2
print "subj sort #{@groups[subj]['messageinfo'][i][:id]}\n" if Debuglevel > 2
print "subj sort #{@groups[subj]['messageinfo'][i][:server]}\n" if Debuglevel > 2
sort_arr.push(
@groups[subj]["messageinfo"][i].dup
) if serverhash[@groups[subj]["messageinfo"][i][:server]] != nil
end
#p "sort_arr length pre sort: #{sort_arr.length}"
if sort_arr.length != 0
sort_arr.sort!{|a,b|
r = ward_sort(a[:subject], b[:subject])
if serverhash[a[:server]] == nil or serverhash[b[:server]] == nil
print "serverhash[a[:server]]: #{serverhash[a[:server]]}\n"
print "serverhash[b[:server]]: #{serverhash[b[:server]]}\n"
print "a[:server]: #{a[:server]}\n"
print "b[:server]: #{a[:server]}\n"
print "strange things going on here...\n"
end
if r == 0
r = serverhash[a[:server]] <=> serverhash[b[:server]]
end
r
}
end
@groups[subj].clear
@groups[subj]["total"] = total
#p "sort_arr length post sort: #{sort_arr.length}"
sort_arr.collect{|i|
if @groups[subj].has_key?("messageinfo")
@groups[subj]["messageinfo"].push(i)
else
@groups[subj]["messageinfo"] = [ i ]
end
print "subject sort: #{i[:subject]}\n" if Debuglevel > 2
print "server: #{i[:server]}\n" if Debuglevel > 2
}
#if ! @groups[subj]['messageinfo'].nil?
# p "post sort length: #{@groups[subj]['messageinfo'].length}"
#end
#print "Done sorting\n"
end
def ward_sort(a, b)
c = a.to_s.split(/([0-9]+)/)
d = b.to_s.split(/([0-9]+)/)
c.collect{|x|
y = d.shift
r = ((x.to_s =~ /^[0-9]+$/) && (y.to_s =~ /^[0-9]+$/)) ?
(x.to_i <=> y.to_i) :
(x.to_s <=> y.to_s)
if r != 0
return r
end
}
return -1 if (d != [])
return 0
end
def rechunk_runlist(runlist)
return nil if runlist == nil
chunksize = 500
blalist = runlist.split(',')
# hmmm, als het aantal articles wat tussen de komma's ligt < pak um beet 3
# dan is het volgens mij heel erg de moeite die 3 ook gewoon binnen te halen
# en minder network requests te doen...
# de manier om dat te doen is dan iets van die komma weghalen en
# een van de 2 getallen...
blalist.collect!{|x|
result = ""
if x =~ /(.*)-(.*)/
a = $1
while ($2.to_i - a.to_i) > chunksize
result << "#{a}-#{a.to_i+(chunksize-1)},"
a = a.to_i + chunksize
end
result << "#{a}-#{$2}"
else
x
end
}
blup = blalist.join(",")
return blup
end
def printerr(server)
print "Caught #{$!.class} reading from server #{server} (#{caller[0]})\n"
print "Error: #{$!}\n"
end
def disconnect
for server in @connections.keys
begin
@connections[server]["nntp"].quit
rescue Errno::EPIPE, Errno::ECONNRESET, EOFError, IOError
end
end
end
def quit
# just testing if these should be reset...
@messageinfo = []
disconnect
end
private :ward_sort
end # class

View file

@ -0,0 +1,465 @@
# $Dwarf: newsrc.rb,v 1.12 2003/07/20 20:32:01 ward Exp $
# $Source$
#
# Copyright (c) 2002, 2003 Ward Wouts <ward@wouts.nl>
#
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
require "set/intspan"
module News
class Newsrc
def initialize(file=nil)
@newsrc = { "group" => Hash.new, "list" => Array.new }
if file
unless load(file)
print "Can't load #{file}\n"
exit
end
end
end
def load(file=nil)
file = "#{ENV['HOME']}/.newsrc" unless file
@newsrc["file"] = file
@newsrc["group"] = {}
@newsrc["list"] = []
if FileTest.file?( "#{file}" ) and FileTest.readable?( "#{file}" )
lines = IO.readlines("#{file}")
import_rc(lines)
end
return true
end
def import_rc(lines)
@newsrc["group"] = {}
@newsrc["list"] = []
linenumber = 1
for line in lines
parse(line)
end
end
def parse(line)
unless line =~ /^([^!:]+)([!:])\s(.*)$/x
print "Newsrc.parse: Bad newsrc line: #{line}\n"
exit
end
name = $1
mark = $2
articles = $3
unless Set::IntSpan.valid(articles)
print "Newsrc.parse: Bad article list: #{line}\n"
end
group = { "name" => name, "subscribed" => (mark == ":"),
"articles" => Set::IntSpan.new(articles)}
@newsrc["group"][name] = group
@newsrc["list"].push(group)
end
def save
unless @newsrc.has_key?("file")
@newsrc["file"] = "#{$ENV['HOME']}/.newsrc"
end
save_as(@newsrc["file"])
end
# this is not thread safe!
def save_as(file)
if FileTest.exists?("#{file}")
begin
File.rename(file, "#{file}.bak")
rescue
print "Can't rename #{file}, #{file}.bak: #{$!}\n"
exit
end
end
begin
newsrc = File.new(file, "w")
rescue
print "Can't open #{file}: #{$!}\n"
exit
end
@newsrc["file"] = file
for group in @newsrc["list"]
newsrc.print format(group)
end
newsrc.close
end
def save_group(group)
unless @newsrc.has_key?("file")
@newsrc["file"] = "#{$ENV['HOME']}/.newsrc"
end
save_as(@newsrc["file"], group)
end
# this should be thread safe
def save_group_as(file, group)
if FileTest.exists?("#{file}")
if ( ! File.copy(file, "#{file}.bak") )
print "Can't copy #{file} to #{file}.bak: #{$!}\n"
end
end
begin
newsrc = File.new(file, "r+").flock(File::LOCK_EX)
rescue
print "Can't open ${file}: #{$!}\n"
exit
end
# read file
lines = newsrc.readlines
# pointer -> 0
newsrc.rewind
# write read stuff & replace group
for line in lines
if line =~ /^#{group}(:|!)/
newsrc.print line
else
newsrc.print format(group)
end
end
newsrc.flock(File::LOCK_UN) # what's the right order here?
newsrc.close
end
def format(group)
name = group["name"]
sub = group["subscribed"] ? ':' : '!'
articles = group["articles"].run_list
#space = articles ? ' ' : ''
#return "#{name}#{sub}#{space}#{articles}\n"
return "#{name}#{sub} #{articles}\n"
end
def export_rc
lines = @newsrc["list"].collect{ |group|
name = group["name"]
sub = group["subscribed"] ? ':' : '!'
articles = group["articles"].run_list
space = articles ? ' ' : ''
"#{name}#{sub}#{space}#{articles}\n" }
return lines
end
def add_group(name, options)
if @newsrc["group"].has_key?(name)
options.has_key?("replace") or return false
del_group(name)
end
group = {"name" => name,
"subscribed" => true,
"articles" => Set::IntSpan.new }
@newsrc["group"][name] = group
_insert(group, options)
return true
end
def move_group(name, options)
if @newsrc["group"].has_key?(name)
group = @newsrc["group"][name]
else
return false
end
@newsrc["list"] = @newsrc["list"].delete_if{|x| x["name"] == name}
_insert(group, options)
return true
end
def _insert(group, options)
list = @newsrc["list"]
where = ""
arg = ""
if options.has_key?("where")
where = options["where"]
end
arg = where.slice!(1) if where.class.to_s == "Array"
case where.to_s
when "first"
@newsrc["list"].unshift(group)
when "last"
@newsrc["list"].push(group)
when ""
@newsrc["list"].push(group) # default
when "alpha"
alpha(group)
when "before"
before(group, arg)
when "after"
after(group, arg)
when "number"
number(group, arg)
end
end
def alpha (group)
name = group["name"]
for i in (0...@newsrc["list"].length)
if ((name <=> @newsrc["list"][i]["name"]) == -1)
upper = @newsrc["list"].slice!(i..@newsrc["list"].length)
@newsrc["list"].push(group)
@newsrc["list"].push(upper)
return;
end
end
@newsrc["list"].push(group)
end
def before(group, before)
name = group["name"]
for i in (0...@newsrc["list"].length)
if (@newsrc["list"][i]["name"] == before.to_s)
upper = @newsrc["list"].slice!(i..@newsrc["list"].length)
@newsrc["list"].push(group)
@newsrc["list"].push(upper)
return;
end
end
@newsrc["list"].push(group)
end
def after(group, after)
name = group["name"]
for i in (0...@newsrc["list"].length)
if (@newsrc["list"][i]["name"] == after.to_s)
upper = @newsrc["list"].slice!((i+1)..@newsrc["list"].length)
@newsrc["list"].push(group)
@newsrc["list"].push(upper)
return;
end
end
@newsrc["list"].push(group)
end
def number(group, offset)
offset = @newsrc["list"].length if offset[0] > @newsrc["list"].length
upper = @newsrc["list"].slice!(offset..@newsrc["list"].length)
@newsrc["list"].push(group)
@newsrc["list"].push(upper)
end
def del_group(name)
if @newsrc["group"].has_key?(name)
group = @newsrc["group"][name]
else
return false
end
@newsrc["group"].delete(name)
@newsrc["list"] = @newsrc["list"].delete_if{|x| x["name"] == name}
return true
end
def subscribe(name, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
@newsrc["group"][name]["subscribed"] = true
end
def unsubscribe(name, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
@newsrc["group"][name]["subscribed"] = false
end
def mark(name, article, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
@newsrc["group"][name]["articles"].insert(article)
end
def mark_list(name, list, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
articles = @newsrc["group"][name]["articles"].union(list)
@newsrc["group"][name]["articles"] = articles
end
def mark_range(name, from, to, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
range = Set::IntSpan.new("#{from}-#{to}")
articles = @newsrc["group"][name]["articles"].union(range)
@newsrc["group"][name]["articles"] = articles
end
def unmark(name, article, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
@newsrc["group"][name]["articles"].remove(article)
end
def unmark_list(name, list, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
articles = @newsrc["group"][name]["articles"].diff(list)
@newsrc["group"][name]["articles"] = articles
end
def unmark_range(name, from, to, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
range = Set::IntSpan.new("#{from}-#{to}")
articles = @newsrc["group"][name]["articles"].diff(range)
@newsrc["group"][name]["articles"] = articles
end
def exists(name)
return @newsrc["group"].has_key?(name) ? true : false
end
def subscribed(name)
exists(name) and @newsrc["group"][name]["subscribed"]
end
def marked(name, article)
exists(name) and @newsrc["group"][name]["articles"].member(article)
end
def num_groups
return @newsrc["list"].length
end
def groups
list = @newsrc["list"].dup
list.collect!{|x| x["name"]}
end
def sub_groups
list = @newsrc["list"].dup
list.collect!{|x| x["subscribed"] ? x["name"] : nil}.compact!
end
def unsub_groups
list = @newsrc["list"].dup
list.collect!{|x| x["subscribed"] ? nil : x["name"]}.compact!
end
def marked_articles(name, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
return @newsrc["group"][name]["articles"].elements
end
def unmarked_articles(name, from, to, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
range = Set::IntSpan.new("#{from}-#{to}")
return range.diff(@newsrc["group"][name]["articles"]).elements
end
def get_articles(name, options = {"where" => ""})
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
@newsrc["group"][name]["articles"].run_list
end
def set_articles(name, articles, options = {"where" => ""})
Set::IntSpan.valid(articles) or return false
set = Set::IntSpan.new(articles)
set.finite or return false
min = set.min
min != nil and min < 0 and return false
unless @newsrc["group"].has_key?(name)
add_group(name, options)
end
@newsrc["group"][name]["articles"] = set
return true
end
end # class
end # module
# TODO
# Do not kill an item until it's tested!
# [x] new
# [x] load
# [ ] _scan # Initializes a Newsrc object from a string. Used for testing.
# [x] import_rc
# [x] parse # parses a single line from a newsrc file
# [x] save
# [x] save_as
# [ ] save_group
# [ ] save_group_as
# [x] format
# [x] export_rc
# [ ] _dump # Formats a Newsrc object to a string. Used for testing
# [x] add_group
# [x] move_group
# [x] Splice(\@$$@) # heet nu number en is simpeler
# [x] _insert
# [x] Alpha
# [x] Before
# [x] After
# [x] del_group
# [x] subscribe
# [x] unsubscribe
# [x] mark
# [x] mark_list
# [x] mark_range
# [x] unmark
# [x] unmark_list
# [x] unmark_range
# [x] exists
# [x] subscribed
# [x] marked
# [x] num_groups
# [x] groups
# [x] sub_groups
# [x] unsub_groups
# [x] marked_articles
# [x] unmarked_articles
# [x] get_articles
# [x] set_articles

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,16 @@
#!/usr/local/bin/ruby
require '../newsrc.rb'
def test1
print "Test 1\n"
@newsrc = News::Newsrc.new("newsrc.news.wizeazz.nl")
print @newsrc.get_articles("alt.binaries.sounds.mp3.gothic-industrial")
print "\n"
@newsrc.unmark_range("alt.binaries.sounds.mp3.gothic-industrial", 0, 2394540)
print @newsrc.get_articles("alt.binaries.sounds.mp3.gothic-industrial")
print "\n"
end
test1

View file

@ -0,0 +1,18 @@
./CHANGELOG
./INSTALL
./README
./TODO
./encode/tests/testdata
./encode/tests/testdata.uu
./encode/tests/testdata.ync
./encode/tests/uu_test.rb
./encode/tests/yenc_test.rb
./encode/uuencode.rb
./encode/yenc.rb
./net/nntp.rb
./news/tests/newsrc.news.wizeazz.nl
./news/tests/newsrc_test.rb
./news/article.rb
./news/newsrc.rb
./ripnews.rb
./set/intspan.rb

View file

@ -0,0 +1,10 @@
Zorg dat alles opgeruimd is. Ga in de ripnews worktree staan en tag de
release.
cvs -q tag ripnews-release_0_2_2
Hierna kunnen de release files getarred worden. In de file "release-tar"
staan de filenames die meegenomen moeten worden en de handige manier om
dit te gebruiken is:
tar -czvf /tmp/ripnews-0.2.2.tgz -s /./ripnews-0.2.2/ -I notes/release-tar

View file

@ -0,0 +1,717 @@
#!/usr/local/bin/ruby -w
# $Dwarf: ripnews.rb,v 1.100 2005/02/05 08:27:29 ward Exp $
# $Source$
#
# Copyright (c) 2002, 2003, 2004, 2005 Ward Wouts <ward@wouts.nl>
#
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
require 'date'
require 'ftools'
require 'time'
require 'getoptlong'
require 'news/article'
require 'news/newsrc'
require 'tempfile'
require 'thread'
require 'encode/uuencode'
require 'encode/yenc'
###########################################################################
###########################################################################
# memory profiling stuff
MEntry = Struct.new( "MEntry", :c, :mem )
class MEntry; def to_s() "#{c} : #{mem}"; end; end
GroupEntry = Struct.new( "GroupEntry", :c, :mem, :total )
class GroupEntry; def to_s() "#{mem}\t\t#{c} x#{total}"; end; end
def profile_mem(group)
end
def aprofile_mem(group)
t = Thread.new {
groups = {}
ObjectSpace.each_object { |x|
if not [Array,Hash].include? x.class
e = nil
begin
e = MEntry.new( x.class, Marshal::dump(x).size )
rescue TypeError # undumpable
e = MEntry.new( x.class, 0 )
end
if groups.has_key? e.c
groups[e.c].mem += e.mem
groups[e.c].total += 1
else
groups[e.c] = GroupEntry.new( e.c, e.mem, 1 )
end
end
}
File.open( "mem_log", "a+" ) { |file|
file << "Group #{group}\n"
total = 0
file << "bytes/class/count\n"
groups.to_a.sort_by { |e| e[1].mem }.each { |e|
file << "#{e[1]}\n"; total += e[1].mem }
file << "TOTAL == #{total}\n\n"
}
}
sleep 10
t.join
end
###########################################################################
Debuglevel = 0
@tstart = Time.now
def save_file(dir, name, data)
print "savename: #{name}\n" if Debuglevel > 1
nname = name.gsub(/\//, "-")
nname.sub!(/\s*$/, "")
nname.sub!(/^[\s\.-]*/, "")
print "nname: #{nname}\n" if Debuglevel > 1
newname = nname
count = 1
d = Date.today
date = "#{d.year}#{d.month}#{d.mday}"
while FileTest.exists?("#{dir}/#{newname}")
newname = "#{nname}-<#{date}.#{count}>"
count += 1
end
print "name: #{newname}\n" if Debuglevel > 1
case data.class.to_s
when "String"
begin
if File.move(data, "#{dir}/#{newname}")
print " Saving as: '#{newname}'\n"
else
print "couldn't rename tempfile\n"
return false
end
rescue Errno::ENOENT
print "Caught Errno::ENOENT (save_file)\n"
print "Error: #{$!}\n"
print "What the *beep* happened?\n"
return false
end
when "Array"
if file = File.new("#{dir}/#{newname}", "w", 0644)
print " Saving as: '#{newname}'\n"
data.collect{|i| file.print "#{i}"}
else
print "couldn't open file for writeing\n"
return false
end
when "Tempfile"
begin
if File.move(data.path, "#{dir}/#{newname}")
print " Saving as: '#{newname}'\n"
else
print "couldn't rename tempfile\n"
return false
end
rescue Errno::ENOENT
print "Caught Errno::ENOENT (save_file)\n"
print "Error: #{$!}\n"
print "What the *beep* happened?\n"
return false
end
else
print "EEEEPS Can't save data of class: #{data.class.to_s}\n"
return false
end
return true
end
def parse_options(options)
begin
opts = GetoptLong.new(
[ "-I", "--include", GetoptLong::REQUIRED_ARGUMENT ],
[ "-c", "--configfile", GetoptLong::REQUIRED_ARGUMENT ],
[ "-L", "--longname", GetoptLong::NO_ARGUMENT ],
[ "-C", "--combinedname", GetoptLong::NO_ARGUMENT ],
[ "-M", "--multipart", GetoptLong::NO_ARGUMENT ],
[ "-S", "--singlepart", GetoptLong::NO_ARGUMENT ],
[ "-T", "--test", GetoptLong::NO_ARGUMENT ],
[ "-X", "--exclude", GetoptLong::REQUIRED_ARGUMENT ]
)
opts.quiet=true
opts.each do |opt, arg|
options[opt] = arg
end
rescue GetoptLong::InvalidOption
print "#{$!}\n"
usage
end
return options
end
def usage
print "\nUsage:\n\n"
print "ripnews.rb [-I <pattern>] [-c <file>] [-L] [-C] [-M] [-S] [-T] [-X <pattern>]\n\n"
print "-I <pattern> specify an include pattern\n"
print "-c <file> specify an alternate configfile\n"
print "-L use subject as filename\n"
print "-C use combined filenames\n"
print "-M get multipart articles\n"
print "-S get singlepart articles\n"
print "-T test mode, don't update newsrc file\n"
print "-X <pattern> specify an exclude pattern\n"
exit
end
def parse_config(default = {})
print "Parsing config\n"
print "#{default['-c']}\n"
if FileTest.readable?("#{default['-c']}")
file = File.new("#{default['-c']}")
lines = file.readlines
else
lines = []
end
i = 0
group = ""
grouparr = []
config = {}
lines.collect!{|x|
x.gsub!(/\$\{HOME\}/, "#{ENV['HOME']}")
if x =~ /^\s*INCLUDEFILE=(.*?)\s*$/i
x = File.new($1).readlines
end
x
}
lines.flatten!
lines.collect!{|x|
x.sub!(/^\s*/, "")
x.sub!(/\#.*$/, "")
x.sub!(/\s*$/, "")
x.gsub!(/\$\{HOME\}/, "#{ENV['HOME']}")
x.chomp
}
while i < lines.length
line = lines[i]
while line.sub!(/\s*\\$/, "") != nil
line << lines[i+1]
i += 1
end
line.sub!(/\s*$/, "")
i += 1
if line =~ /^OPT_(.*?)=(.*)/
line = "-#{$1}=#{$2}"
end
print "#{i}: #{line}\n" if Debuglevel > 1
if line =~ /(.*?)\s*\+=\s*(.*)/
if group == ""
if default.has_key?($1)
default[$1] << $2
else
default[$1] = $2
end
else
grouparr.collect{|g|
if config[g].has_key?($1)
config[g][$1] << $2
elsif default.has_key?($1)
config[g][$1] = default[$1] + $2
else
config[g][$1] = $2
end
}
end
elsif line =~ /(.*?)\s*=\s*(.*)/
if group == ""
default[$1] = $2
else
grouparr.collect{|g|
config[g][$1] = $2
}
end
elsif line =~ /(.*?)\s*\{/
group = $1
grouparr = group.split('|')
grouparr.collect{|g|
config[g] = {} unless config.has_key?(g)
}
elsif line =~ /^\}$/
default.each_key{|x|
grouparr.collect{|g|
config[g][x] = default[x] unless config[g].has_key?(x)
}
}
group = ""
grouparr = []
elsif line =~ /^$/
next
else
print "Error parsing config on line: #{i}\n"
return false
end
end
if group != ""
print "Error parsing config: group not terminated on line #{i}\n"
return false
end
if Debuglevel > 2
config.each_key{|x|
print "Group: #{x}\n"
config[x].each_key{|y|
print "Key: '#{y}' => Value: '#{config[x][y]}'\n"
}
}
end
return config
end
def check_config
if @config.length == 0
print "No configuration, nothing to do\n"
exit
end
@config.each_key {|i|
unless @config[i].has_key?("-I")
print "No inclusions given for group #{i}. Won't match anything.\n"
end
@config[i]["DATADIR"] ="." unless @config[i].has_key?("DATADIR")
@config[i]["PERMISSION"] = "0755" unless @config[i].has_key?("PERMISSION")
if @config[i].has_key?("EXTENSIONS")
@config[i]["-S"] = @config[i]["EXTENSIONS"]
@config[i]["-M"] = @config[i]["EXTENSIONS"]
end
if @config[i].has_key?("DELEXT")
@config[i]["-SD"] = @config[i]["DELEXT"]
@config[i]["-MD"] = @config[i]["DELEXT"]
end
@config[i]["-M"] = "(?!.*)" if @config[i].has_key?("-S") and ! @config[i].has_key?("-M")
@config[i]["-S"] = "(?!.*)" if @config[i].has_key?("-M") and ! @config[i].has_key?("-S")
}
end
def lock
group = @config.keys[0]
if @config[group].has_key?("LOCKFILE")
if FileTest.exists?(@config[group]["LOCKFILE"])
lock = File.open(@config[group]["LOCKFILE"], "r")
pid = lock.gets
lock.close
if pid
pid.chomp!
begin
Process.kill(0, pid.to_i)
print "Already running, exiting...\n"
exit
rescue Errno::ESRCH
print "Stale lock found... removing...\n"
File.unlink(@config[group]["LOCKFILE"])
end
else
print "Empty lockfile found... removing...\n"
File.unlink(@config[group]["LOCKFILE"])
end
end
lock = File.new(@config[group]["LOCKFILE"], "w")
lock.print "#{Process.pid}\n"
lock.close
end
end
def unlock
group = @config.keys[0]
File.unlink(@config[group]["LOCKFILE"])
end
def renice
group = @config.keys[0]
if @config[group].has_key?("NICE")
Process.setpriority(Process::PRIO_PROCESS, 0, @config[group]["NICE"].to_i)
end
end
def get_single(subj, group)
print "Fetching singlepart article: #{subj}\n"
body = @articles.get_group_body(subj)
if UUEncode.is_uuencoded(body)
filename = UUEncode.get_filename(body)
print " filename #{filename}\n"
unless check_ext(group, filename, "s", subj)
print " Skipping article...\n"
return false
end
print " UUDecoding...\n"
mode, filename, body = UUEncode.uudecode(body)
elsif YEnc.is_yencoded(body)
filename = YEnc.get_filename(body)
unless check_ext(group, filename, "s", subj)
print " Skipping article...\n"
return false
end
print " YDecoding...\n"
mode, filename, body = YEnc.ydecode(body)
else
print " Unknown encoding (not UU, not yEnc), skipping...\n"
return false
end
if mode == false
print " Decoding failed skipping article...\n"
return false
end
output_data(subj, mode, filename, body)
return true
end
def get_multi(subj, group)
print "Fetching multipart article: #{subj}\n"
body = @articles.get_group_body_first(subj)
if UUEncode.is_uuencoded(body) or YEnc.is_yencoded(body)
if UUEncode.is_uuencoded(body)
filename = UUEncode.get_filename(body)
print " filename #{filename}\n"
unless check_ext(group, filename, "m", subj)
print " Skipping article...\n"
return false
end
elsif YEnc.is_yencoded(body)
print "yencc\n"
filename = YEnc.get_filename(body)
print "filename #{filename}\n"
unless check_ext(group, filename, "m", subj)
print " Skipping article...\n"
return false
end
end
if @config[group]["TEMPDIR"] == nil or @config[group]["TEMPDIR"] == ""
bodyrest = @articles.get_group_body_rest(subj)
unless bodyrest
print " Skipping article...\n"
return false
end
body.concat(bodyrest)
else
file = Tempfile.new("riptmp", @config[group]["TEMPDIR"])
body.collect{|x| file.print "#{x}\n"}
unless @articles.get_group_body_rest(subj, file)
print " Skipping article...\n"
return false
end
fileout = Tempfile.new("riptmp", @config[group]["TEMPDIR"])
end
@decode_threads << Thread.new(body, file, fileout, subj) do |tbody, tfile, tfileout, tsubj|
# imediately stop to continue with main program
Thread.stop
puts "inside thread\n"
if UUEncode.is_uuencoded(tbody)
print " UUDecoding...\n"
if tfile
tmode, tfilename, tbody = UUEncode.uudecode(tfile, tfileout)
else
tmode, tfilename, tbody = UUEncode.uudecode(tbody)
end
elsif YEnc.is_yencoded(tbody)
print " YDecoding...\n"
begin
if tfile
tmode, tfilename, tbody = YEnc.ydecode(tfile, tfileout)
else
tmode, tfilename, tbody = YEnc.ydecode(tbody)
end
rescue YencError
# XXX if there is a yenc problem I want the data so I can research it
output_data(tsubj, 0600, "YencProblem", tbody)
# XXX return succes even though it's not true
Thread.current.exit
rescue PermError
print "#{$!}\n"
print " Skipping article...\n"
Thread.current.exit
end
end
if tmode == false
print " Decoding failed skipping article...\n"
Thread.current.exit
end
if tfile
# horrible cheat to not lose the outputted file
tbody = tfileout.path
tbodybase = tbody.sub(/\/[^\/]*$/, "/ripnewsdecode")
i = 1
while FileTest.exists?("#{tbodybase}-#{i}")
i += 1
end
File.move(tbody, "#{tbodybase}-#{i}")
tbody = "#{tbodybase}-#{i}"
tfile.close
tfileout.close(false)
end
output_data(tsubj, tmode, tfilename, tbody)
end # thread end
@decode_threads.each{ |thr|
if thr.status == "sleep" # and fire up the threads again
thr.run
elsif thr.status == "false" # remove finished threads
thr.join
# else
# p thr.status
end
}
puts "ouside thread\n"
return true
else
print " Unknown encoding (not UU, not yEnc), skipping...\n"
return false
end
end
def fill_preselector(group)
if @config[group].has_key?("-I")
@articles.set_preselect_pattern(Regexp.new(@config[group]["-I"]))
end
end
def output_data(subject, mode, filename="", body="")
group = @articles.get_groupname
print " mode: #{mode}\n" if Debuglevel > 0
print " Filename: '#{filename}'\n" if Debuglevel > 0
# de-crap subject...
sub = subject.sub(/\s*$/, "") # strip trailing spaces
sub.sub!(/^[\s\.!-#]*/, "") # strip leading spaces, dots, exclamation points, dashes and hashes
# decide on a filename
if @config[group].has_key?("-L") and @config[group]["-L"]
print "longname\n" if Debuglevel > 1
outfile = sub[0...@maxfilelength]
elsif @config[group].has_key?("-C") and @config[group]["-C"]
print "combinedname\n" if Debuglevel > 1
outfile = sub[0...@maxfilelength-filename.length-3]
outfile = "#{outfile} [#{filename}]"
if outfile.length > @maxfilelength
outfile = filename[0...@maxfilelength]
end
else
print "shortname\n" if Debuglevel > 1
outfile = filename[0...@maxfilelength]
end
# do the actual saving
if save_file("#{@config[group]["DATADIR"]}/#{group}", outfile, body)
@newsrc_lock.synchronize {
@articles.group_update_newsrc(subject)
@articles.save_newsrc unless @config[group].has_key?("-T") and @config[group]["-T"]
}
end
end
def check_ext(group, filename, mode, subject)
case mode
when "s"
if @config[group].has_key?("-SD") && ( filename =~ /\.(#{@config[group]["-SD"]})$/ )
print "Marking '#{subject}' as read\n"
@articles.group_update_newsrc(subject)
return false
end
return @config[group].has_key?("-S") ? ( filename =~ /\.(#{@config[group]["-S"]})$/ ) : true
when "m"
if @config[group].has_key?("-MD") && ( filename =~ /\.(#{@config[group]["-MD"]})$/ )
print "Marking '#{subject}' as read\n"
@articles.group_update_newsrc(subject)
return false
end
return @config[group].has_key?("-M") ? ( filename =~ /\.(#{@config[group]["-M"]})$/ ) : true
else
print "Illegal mode \"#{mode}\" in check_ext\n"
exit
end
end
def get_max_file_length(tempdir=".")
if ! FileTest.directory?("#{tempdir}") || ! FileTest.writable?("#{tempdir}")
print "Tempdir '#{tempdir}' is not a writable directory\n"
exit
end
# this is quite stupid, there is no guarantee at all the generated file names
# don't already exist
name = "a"*500
name = "#$$#{name}"
begin
file = File.new("#{tempdir}/#{name}", "w", 0644).close
File.delete("#{tempdir}/#{name}")
rescue Errno::ENAMETOOLONG
name = name[0...-1]
retry
rescue Errno::ENOENT
print "#{$!}\n"
print "strange...\n"
retry
end
# this is how many characters are still likely to be appended
# is the filename already exists '-<#{date}.#{count}>' in save_file
# this could be brought back to 5 '-<#{count}>' ...
return name.length - 14
end
def ward_sort(a, b)
c = a.to_s.split(/([0-9]+)/)
d = b.to_s.split(/([0-9]+)/)
c.collect{|x|
y = d.shift
r = ((x.to_s =~ /^[0-9]+$/) && (y.to_s =~ /^[0-9]+$/)) ?
(x.to_i <=> y.to_i) :
(x.to_s <=> y.to_s)
if r != 0
return r
end
}
return -1 if (d != [])
return 0
end
def startup
$stdout.sync=true # line buffered output
@defaults = {'-c' => "#{ENV['HOME']}/.ripnewsrc"}
@defaults = parse_options(@defaults)
@config = parse_config(@defaults)
exit if @config == false
check_config
lock
renice
trap("HUP") {
print "Rereading config...\n"
config = parse_config(@defaults)
if config != false
@config = config
check_config
print "Done reading config\n"
else
print "Keeping old config due to errors\n"
end
}
@maxfilelength = get_max_file_length(@config[@config.keys[0]]["TEMPDIR"])
print "\n$Id$\n"
print "Starting: #{@tstart}\n"
if Debuglevel > 2
@config.each_key{|i|
print "Group: #{i}\n"
@config[i].each_key{|j|
print "Opt: #{j} val: #{@config[i][j]}\n"
}
}
end
end
def main
profile_mem("out side of loop still")
for group in @config.keys.sort
@decode_threads = []
@newsrc_lock = Mutex.new
profile_mem("#{group} start")
# puts "object count:"
# puts ObjectSpace.each_object(){}
print "\nGetting articles for #{group}\n"
@articles = Article.new(@config[group]["NNTPSERVER"], group, @config[group]["NEWSRCNAME"])
fill_preselector(group)
print "initialized\n"
@articles.get_articles(@config[group]["CACHEDIR"])
profile_mem("#{group} articles read")
unless FileTest.directory?("#{@config[group]["DATADIR"]}/#{group}") or
Dir.mkdir("#{@config[group]["DATADIR"]}/#{group}", @config[group]["PERMISSION"].oct)
print "eeeps, couldn't create dir\n"
exit
end
for i in @articles.get_group_subjects.sort{|a, b| ward_sort(a, b)}
print "#{i}\n" if Debuglevel > 2
if @config[group].has_key?("-MR") and i =~ /#{@config[group]["-MR"]}/
print "Marking '#{i}' as read\n"
@articles.group_update_newsrc(i)
next
end
if !(@config[group].has_key?("-X") and i =~ /#{@config[group]["-X"]}/) and
i =~ /#{@config[group]["-I"]}/
print "Match: #{i}\n" if Debuglevel > 0
if @articles.group_is_complete(i)
begin
if @articles.group_is_singlepart(i)
get_single(i, group)
elsif @articles.group_is_multipart(i)
get_multi(i, group)
end
#rescue Article::TempError, Article::PermError
rescue TempError, PermError
print "#{$!}\n"
print " Skipping article...\n"
#print "Caught #{$!.class}\n"
#print "Error: #{$!}\n"
next
end
else
print "Not complete: #{i}\n"
end
end
end
# hier wachten op evt. threads...
if @decode_threads
@articles.disconnect
puts "Waiting for decode threads..."
@decode_threads.each{|thr| thr.join}
puts "Decode threads all done"
end
@articles.quit
@articles = nil
profile_mem("#{group} pre-GC")
GC.start
profile_mem("#{group} end")
end
end
def ending
tend = Time.now
print "\nFinished: #{tend}\n"
runtime = (tend - @tstart).to_i
h=runtime/3600
m=runtime%3600
s=m%60
m=m/60
printf("Running time: %02d:%02d:%02d\n", h, m, s)
unlock
end
startup
main
ending

View file

@ -0,0 +1,934 @@
# $Dwarf: intspan.rb,v 1.14 2003/07/20 20:32:24 ward Exp $
# $Source$
#
# Copyright (c) 2002, 2003 Ward Wouts <ward@wouts.nl>
#
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
module Set
class IntSpan
Empty_String = '-'
Debuglevel = 0
def initialize(setspec=nil)
@set = { "empty_string" => Empty_String }
print "initialize: Calling copy\n" if Debuglevel > 0
copy(setspec)
end
def IntSpan.valid(run_list)
testset = new
begin
testset._copy_run_list(run_list)
rescue SystemExit
return false
end
return true
end
def copy(set_spec)
print "Copy #{set_spec.class.to_s}\n" if Debuglevel > 0
case set_spec.class.to_s
when "NilClass"
print "copy: Calling _copy_empty\n" if Debuglevel > 0
_copy_empty
when "String"
print "copy: Calling _copy_run_list\n" if Debuglevel > 0
_copy_run_list(set_spec)
when "Array"
print "copy: Calling _copy_array\n" if Debuglevel > 0
_copy_array(set_spec)
when "Set::Intspan"
print "copy: Calling _copy_set\n"
_copy_set(set_spec)
when "Hash"
print "copy: Calling _copy_set\n"
_copy_set(set_spec)
else
print "eeps\n"
end
end
def _copy_empty # makes @set the empty set
@set = { "negInf" => false }
@set["posInf"] = false
@set["edges"] = []
@set["run"] = []
end
def _copy_array(array) # copies an array into @set
@set["negInf"] = false
@set["posInf"] = false
#print "scary thingy gets called!!!\n"
edges = []
for element in array.sort
next if (edges.length > 0) and (edges[-1] == element) # skip duplicates
if (edges.length > 0) and (edges[-1] == element-1)
edges[-1] = element
else
edges.push(element-1, element)
end
end
@set["edges"] = edges
@set["run"] = []
end
def _copy_set(src) # copies one set to another
@set["negInf"] = src.neg_inf
@set["posInf"] = src.pos_inf
@set["edges"] = src.edges
@set["run"] = []
end
def _copy_run_list(runlist)
_copy_empty
runlist.gsub!(/\s|_/, '')
return true if runlist == ""
print "copy run list...\n" if Debuglevel > 0
first = true
last = false
edges = []
for i in runlist.split(/,/)
print "#{i}\n" if Debuglevel > 0
begin
if i =~ /^(-?\d+)$/x
edges.push(($1.to_i-1), $1.to_i)
next
end
if i =~ /^ (-?\d+) - (-?\d+) $/x
if $1.to_i > $2.to_i
print "match rule 1 #{$1} > #{$2}\n"
print "Set::IntSpan::_copy_run_list: Bad order: #{runlist}\n"
exit
else
edges.push(($1.to_i-1), $2.to_i)
next
end
end
if i =~ /^\(-(-?\d+)$/x
unless first
print "match rule 2\n"
print "Set::IntSpan::_copy_run_list: Bad order: #{runlist}\n"
exit
end
@set = {"negInf" => true}
edges.push($1.to_i)
next
end
if i =~ /^(-?\d+)-\)$/x
print "match rule 3\n"
edges.push(($1.to_i-1))
@set = {"posInf" => true}
last = true
next
end
if i =~ /^\(-\)$/x
unless first
print "match rule 4\n"
print "Set::IntSpan::_copy_run_list: Bad order: #{runlist}\n"
exit
end
@set = {"negInf" => true}
@set = {"posInf" => true}
last = true
next
end
print "no match! \"#{i}\"\n"
print "Set::IntSpan::_copy_run_list: Bad syntax: #{runlist}\n"
end
first = false
end
@set["edges"] = edges
@set["run"] = []
return true
end
# check for overlapping runs
# delete duplicate edges
def _cleanup
edges = @set["edges"]
for i in (0..(edges.length-1))
cmp = edges[i] <=> edges[i+1];
begin
case cmp
when -1
i = i + 1
break
when 0
edges.slice!(i..(i+1))
break
when 1
return 0
end
end
end
1
end
#def splice(array, offset, length=nil, list=[])
# if offset >= 0
# length = array.length-offset unless length
# leftarray = array.slice(0, offset)
# rightarray = array.slice(offset+length, (array.length - offset))
# else
# length = array.length+offset unless length
# leftarray = array.slice(0, (array.length+offset))
# rightarray = array.slice(array.length+length+offset, array.length+offset)
# end
#
# array = leftarray
# array += list
# array += rightarray if rightarray
#
# return array
#end
def run_list
if empty
return @set["empty_string"]
end
print "edges leng: ", @set["edges"].length, "\n" if Debuglevel > 0
edges = []
edges.concat(@set["edges"])
runs = []
if edges.length > 0
edges = ['(', edges] if @set["negInf"]
edges.push(')') if @set["posInf"]
print edges.join("/"),"\n" if Debuglevel > 0
while(edges.length>0)
print "edges leng: ", @set["edges"].length, "\n" if Debuglevel > 0
lower = edges.delete_at(0)
upper = edges.delete_at(0)
print "Lower: \"#{lower}\" Upper: \"#{upper}\"\n" if Debuglevel > 0
if ((lower.to_s <=> '(')!=0 and
(upper.to_s <=> ')')!=0 and
((lower+1) == upper))
print "#{upper}\n" if Debuglevel > 0
runs.push("#{upper}")
else
lower += 1 if (lower.to_s <=> "(")!=0
print "#{lower}-#{upper}\n" if Debuglevel > 0
runs.push("#{lower}-#{upper}")
end
end
end
print "edges leng: ", @set["edges"].length, "\n" if Debuglevel > 0
return runs.join(',')
end
def elements
if (@set["negInf"] == true or @set["posInf"] == true)
print "Set::IntSpan::elements: infinite set\n"
exit
end
elements = []
edges = @set["edges"].dup
while (edges.length>0)
lower, upper = edges.slice!(0..1)
elements += (lower+1 .. upper).to_a
end
return elements
end
def _real_set(set_spec=nil) # converts a set specification into a set
(set_spec != nil and set_spec.class.to_s == "Set::IntSpan") ?
set_spec :
IntSpan.new(set_spec)
end
def union(set_spec)
b = _real_set(set_spec)
s = IntSpan.new
s.set_neg_inf(@set["negInf"] || b.neg_inf)
eA = @set["edges"]
eB = b.edges
eS = s.edges
inA = @set["negInf"]
inB = b.neg_inf
iA = 0
iB = 0
while (iA < eA.length and iB < eB.length)
xA = eA[iA]
xB = eB[iB]
if (xA < xB)
iA += 1
inA = ! inA
not inB and eS.push(xA)
elsif (xB < xA)
iB += 1
inB = ! inB
not inA and eS.push(xB)
else
iA += 1
iB += 1
inA = ! inA
inB = ! inB
inA == inB and eS.push(xA)
end
end
iA < eA.length and (! inB) and eS.concat(eA[iA..eA.length])
iB < eB.length and (! inA) and eS.concat(eB[iB..eB.length])
s.set_pos_inf(@set["posInf"] || b.pos_inf)
s.set_edges(eS)
return s
end
def intersect(set_spec)
b = _real_set(set_spec)
s = IntSpan.new
s.set_neg_inf(@set["negInf"] && b.neg_inf)
eA = @set["edges"]
eB = b.edges
eS = s.edges
inA = @set["negInf"]
inB = b.neg_inf
iA = 0
iB = 0
while (iA < eA.length and iB < eB.length)
xA = eA[iA]
xB = eB[iB]
if (xA < xB)
iA += 1
inA = ! inA
inB and eS.push(xA)
elsif (xB < xA)
iB += 1
inB = ! inB
inA and eS.push(xB)
else
iA += 1
iB += 1
inA = ! inA
inB = ! inB
inA == inB and eS.push(xA)
end
end
iA < eA.length and inB and eS.concat(eA[iA..eA.length])
iB < eB.length and inA and eS.concat(eB[iB..eB.length])
s.set_neg_inf(@set["posInf"] && b.pos_inf)
s.set_edges(eS)
return s
end
def diff (set_spec)
b = _real_set(set_spec)
s = IntSpan.new
s.set_neg_inf(@set["negInf"] && ! b.neg_inf)
eA = @set["edges"]
eB = b.edges
eS = s.edges
inA = @set["negInf"]
inB = b.neg_inf
iA = 0
iB = 0
while (iA < eA.length and iB < eB.length)
xA = eA[iA]
xB = eB[iB]
if (xA < xB)
iA += 1
inA = ! inA
not inB and eS.push(xA)
elsif (xB < xA)
iB += 1
inB = ! inB
inA and eS.push(xB)
else
iA += 1
iB += 1
inA = ! inA
inB = ! inB
inA != inB and eS.push(xA)
end
end
iA < eA.length and not inB and eS.concat(eA[iA..eA.length])
iB < eB.length and inA and eS.concat(eB[iB..eB.length])
s.set_edges(eS)
s.set_pos_inf(@set["posInf"] && ! b.pos_inf)
return s
end
def xor(set_spec)
b = _real_set(set_spec)
s = IntSpan.new
s.set_neg_inf(@set["negInf"] ^ b.neg_inf)
eA = @set["edges"]
eB = b.edges
eS = s.edges
iA = 0
iB = 0
while (iA < eA.length and iB < eB.length)
xA = eA[iA]
xB = eB[iB]
if (xA < xB)
iA += 1
eS.push(xA)
elsif (xB < xA)
iB += 1
eS.push(xB)
else
iA += 1
iB += 1
end
end
iA < eA.length and eS.concat(eA[iA..eA.length])
iB < eB.length and eS.concat(eB[iB..eB.length])
s.set_pos_inf(@set["posInf"] ^ b.pos_inf)
s.set_edges(eS)
return s
end
def complement
# complement is inverse set; dit klopt hier dus niet
a = first
b = last
print "first #{a} last #{b}\n" if Debuglevel > 0
if a!=b
s = IntSpan.new("#{a}-#{b}")
comp = xor(s)
else
comp = IntSpan.new("#{a}")
end
if Debuglevel > 0
while i = comp.next
print "#{i}\n"
end
end
comp.set_neg_inf(! comp.neg_inf)
comp.set_pos_inf(! comp.pos_inf)
return comp
end
def superset(set_spec)
b = _real_set(set_spec)
# $b->diff($a)->empty
s = b.diff(self)
return s.empty
end
def subset(set_spec)
b = _real_set(set_spec)
# $a->diff($b)->empty
s = diff(b)
return s.empty
end
def equal(set_spec)
b = _real_set(set_spec)
print "a\n"
@set["negInf"] == b.neg_inf or return false
print "b\n"
@set["posInf"] == b.pos_inf or return false
aEdge = @set["edges"]
bEdge = b.edges
print "aEdge #{aEdge.length} bEdge #{bEdge.length}\n"
aEdge.length == bEdge.length or return false
print "c\n"
for i in (0...aEdge.length)
aEdge[i] == bEdge[i] or return false
end
return true
end
def equivalent(set_spec)
b = _real_set(set_spec)
cardinality == b.cardinality
end
def cardinality
(@set["negInf"] or @set["posInf"]) and return -1
car = 0
edges = @set["edges"]
i=0
while (i < edges.length)
lower = edges[i]
upper = edges[i+1]
car += upper - lower
i += 2
end
return car
end
def empty
if @set["negInf"] == false and @set["edges"].length > 0 and
@set["posInf"] == false
return false
end
return true
end
def finite
if @set["negInf"] == false and @set["posInf"] == false
return true
end
return false
end
def edges
return @set["edges"]
end
def set_edges(edges)
@set["edges"] = edges
end
def neg_inf
return @set["negInf"]
end
def set_neg_inf(negInf)
@set["negInf"] = negInf
end
def pos_inf
return @set["posInf"]
end
def set_pos_inf(posInf)
@set["posInf"] = posInf
end
def infinite
@set["negInf"] or @set["posInf"]
end
def universal
@set["negInf"] and not @set["edges"].length > 0 and @set["posInf"]
end
def member(n)
inSet = @set["negInf"]
edge = @set["edges"]
for i in (0...edge.length)
if inSet
return true if n <= edge[i]
inSet = false
else
return false if n <= edge[i]
inSet = true
end
end
inSet
end
def insert(n)
inSet = @set["negInf"]
edge = @set["edges"]
if (edge.length == 0)
@set["edges"] = [n-1, n]
return
end
if n > edge[-1]+1
@set["edges"].push(n-1, n)
return
elsif n > edge[-1]
@set["edges"][-1] += 1
return
end
for i in (0...edge.length)
if (inSet)
n <= edge[i] and return
inSet = false
else
n <= edge[i] and break
inSet = true
end
end
inSet and return
lGap = i == 0 || n-1 - edge[i-1]
lGap = false if lGap == 0
rGap = i == edge.length-1 ? i : edge[i] - n
rGap = false if rGap == 0
if ( lGap and rGap)
lower = edge[0...i]
upper = edge[i...edge.length]
edge = lower
edge.push(n-1, n)
edge.concat(upper)
elsif (not lGap and rGap)
edge[i-1] += 1
elsif ( lGap and not rGap)
edge[i] -= 1
else
edge.delete_at(i-1)
edge.delete_at(i-1)
end
@set["edges"] = edge
end
def remove(n)
n or return
inSet = @set["negInf"]
edge = @set["edges"]
for i in (0...edge.length)
if (inSet)
break if n <= edge[i]
inSet = false
else
return if n <= edge[i]
inSet = true
end
end
return unless inSet
for i in (0...edge.length)
if edge[i] == n-1 and edge[i+1] == n
lower = edge[0...i]
upper = edge[i+2..edge.length]
edge = lower + upper
break
elsif edge[i] == n-1
edge[i] += 1
break
elsif edge[i] == n
edge[i] += 1
break
elsif edge[i+1] == n
edge[i+1] -= 1
break
elsif edge[i]<n and edge[i+1]>n
lower = edge[0..i]
upper = edge[i+1..edge.length]
edge = lower + [n-1, n] +upper
break
end
i += 1
end
@set["edges"] = edge
end
def min
empty and return nil
neg_inf and return nil
@set["edges"][0]+1
end
def max
empty and return nil
pos_inf and return nil
@set["edges"][-1]
end
def grep_set(block)
return nil if @set["negInf"] or @set["posInf"]
edges = @set["edges"]
sub_edges = []
while (edges.length > 0)
lower = edges[0]
upper = edges[1]
edges = edges.slice(2..edges.length)
for i in (lower+1..upper)
# local $_ = i
# &$block() or next # definately wrong, must eval block
if (sub_edges.length > 0 and sub_edges[-1] == i-1)
sub_edges[-1] = i
else
sub_edges += [ i-1, i ]
end
end
end
sub_set = new
sub_set["edges"] = sub_edges
sub_set
end
def map_set(block)
return nil if @set["negInf"] or @set["posInf"]
map_set = new
edges = @set["edges"]
while (edges.length > 0)
lower = edges[0]
upper = edges[1]
edges = edges.slice(2..edges.length)
for domain in (lower+1..upper)
local $_ = domain;
# for range (&$block()) # definately wrong, must eval block
# map_set.insert(range)
# end
end
end
map_set
end
def first
@set["iterator"] = min
@set["run"] = []
@set["run"][0] = 0
@set["run"][1] = @set["edges"].length > 0 ? 1 : nil
@set["iterator"]
end
def last
lastEdge = @set["edges"].length - 1
@set["iterator"] = max
@set["run"][0] = lastEdge > 0 ? lastEdge-1 : nil
@set["run"][1] = lastEdge
@set["iterator"]
end
def start(startval)
set["iterator"] = nil
startval or return nil
inSet = @set["negInf"]
edges = @set["edges"]
for i in (0...edges.length)
if (inSet)
if (startval <= edges[i])
@set["iterator"] = startval
@set["run"][0] = i ? i-1 : nil
@set["run"][1] = i
return $startval
end
inSet = false
else
if (startval <= edges[i])
return nil
end
inSet = true
end
end
if (inSet)
@set["iterator"] = startval
@set["run"][0] = edges.length > 0 ? edges.length: nil
@set["run"][1] = nil
end
@set["iterator"]
end
def current
@set["iterator"]
end
def next
@set["iterator"] or return first
run1 = @set["run"][1]
run1 or return ++@set["iterator"]
edges = @set["edges"]
if (@set["iterator"] < edges[run1])
@set["iterator"] += 1
return @set["iterator"]
end
if (run1 < edges.length-2)
run0 = run1 + 1
@set["run"] = [run0, run0+1]
@set["iterator"] = edges[run0]+1
elsif (run1 < edges.length-1)
run0 = run1 + 1
@set["run"] = [run0, nil]
@set["iterator"] = edges[run0]+1
else
@set["iterator"] = nil
end
@set["iterator"]
end
def prev
@set["iterator"] or return last
run0 = @set["run"][0]
run0 or return --@set["iterator"]
edges = @set["edges"]
if (@set["iterator"] > edges[run0]+1)
@set["iterator"] -= 1
return @set["iterator"]
end
if (run0 > 1)
run1 = run0 - 1
@set["run"] = [run1-1, run1]
@set["iterator"] = edges[run1]
elsif (run0 > 0)
run1 = run0 - 1
@set["run"] = [nil, run1]
@set["iterator"] = edges[run1]
else
@set["iterator"] = nil
end
@set["iterator"]
end
end # class
end # module
# TODO
# Do not kill an item until it's tested!
# [x] new
# [x] valid
# [ ] copy
# [ ] _copy_empty # makes $set the empty set
# [x] _copy_array # copies an array into a set
# [ ] _copy_set # copies one set to another
# [ ] _copy_run_list # parses a run list
# [ ] _cleanup
# [x] run_list
# [x] elements
# [x] _real_set # converts a set specification into a set
# [x] union
# [x] intersect
# [x] diff
# [x] xor
# [ ] complement
# [x] superset
# [x] subset
# [x] equal
# [x] equivalent
# [x] cardinality
# [x] empty
# [x] finite
# [x] neg_inf { shift->{negInf} }
# [x] pos_inf { shift->{posInf} }
# [x] infinite
# [ ] universal
# [x] member
# [x] insert # way to much code i think
# [x] remove
# [x] min
# [x] max
# [ ] grep_set(&$)
# [ ] map_set(&$)
# [x] first($)
# [x] last($)
# [ ] start($$)
# [x] current($) { shift->{iterator} }
# [x] next($)
# [x] prev($)
# New methods
# [x] set_neg_inf
# [x] set_pos_inf
# [x] set_edges
# [x] edges

View file

@ -0,0 +1,13 @@
#!/usr/local/bin/ruby
require 'set/intspan'
@set = Set::IntSpan.new("895738,895742,895747,895760-895761,895763-895765,895775-895776,895783,895786")
@set.finite
@set.insert(895739)
@set.insert(895740)
@set.insert(895741)
@set2 = Set::IntSpan.new("895759-900000")
@set3 = @set2.diff(@set)
print @set3.run_list
print "\n"

View file

@ -0,0 +1,30 @@
#!/usr/bin/perl
my %servers;
$cachefile = $ARGV[0];
if ($ARGV[1]) {
print "Usage: $ARGV[0] <cachefile>\n";
exit;
}
print "Group: $groupname\n";
while (<>) {
/^([^|]*)\|([^|]*)\|([^|]*)\|(.*)/;
if (exists $servers{$3}) {
push @{$servers{$3}}, "$1|$2|$4\n";
} else {
$servers{$3} = [];
push @{$servers{$3}}, "$1|$2|$4\n";
}
}
foreach (keys %servers) {
print ":KEY: $_\n";
open FH, ">$cachefile.$_" or die "Couldn't write new cachefile\n";
foreach (@{$servers{$_}}) {
print FH;
}
close FH;
}