The aim of this project is to create software that would serve as platform for testing various lossless compression methods and their applications. We have focused on compression of large XML files. The most important factor is the compression ratio and therefore the best known approaches and methods were used. We combined methods for XML compression and text compression.
We have implemented parser that is able to parse non-well-formed XML files. On the output of the parser we have tried several standard methods: block compression, dictionary and statistical methods. We tried also combination of methods not presented before, such as use of block compression followed by statistical methods.
We are happy to announce that XBW has the best compression ratio for the files it was designed for: large XML files. For other types of files however XBW does not perform that great.
The biggest strength of XBW seems to be large XML files with languages using non ASCII characters. On our test corpus containing several cca 20MB Czech XML files we have beaten bzip2 by 81% and rar by 45%. Our program is the best choice for these files we are aware of.
Sourcecode of XBW is available at the sourceforge download page.
XBW user guide (PDF)
XBW user guide (HTML)
User guide contains program requirements, installation instructions, swiple usage examples and full description of program arguments.
XBW documentation (PDF)
Documentation contains following sections :
The XBW project team has following members :
Please contact one of authors directly via email if you want to submit bug reports, request new features etc. We will be very happy to receive your comments.