Text Compression - What algorithm to use -
I have to insert some text data from the form
[70,165,531,0 | 70,166,562 | "Hi", 167,578. 70,171,593 71,179,593 | 73,188,609. "One", 1,3. The data contains a few thousand letters (around 10000 - 50000).
I have read on various compression algorithms, but can not decide which key will be used here.
The important thing here is: the compressed string should contain only alphanumeric characters (or some special characters like + - / & amp;% @ $ ..) I mean That most algorithms provide ambiguous ascii characters correct as compressed data? It should be avoided Can anyone guide me how to move forward here?
In the PS, number , ' and | Characters mainly character other characters are very rare.
Actually your requirement is to automatically set the output character for printable characters You will end up using 25% of your compression profit, as you use about 6 out of 8 bits.
But if you really want, you can always produce base 64 or more space efficiently to convert raw biostream to printable characters.
For compression algorithm, for both well-known open source code, paste one of the better known ones like gzip or bzip2. > Choosing the "best" algorithm is not really easy, here's a part of the list of questions that you have to ask yourself:
- Do I have to encode or encoding But the best speed is required (e.g., the nap is quite asymmetric)
- How important is memory efficiency for the encoder and decoder?
- Size of important code for embedded code
- Do I want a well-tested code for an encoder or decoder or In the same or both languages in the other language
- and so on
The bottom line here may be a representative sample of your data, and some tests with the existing algorithm Run, and on those criteria Benchmarks that are important in terms of your use.
Comments
Post a Comment