Mail size issue with attachment in Fossa and thunderbird
-
neozimpi
Mail size issue with attachment in Fossa and thunderbird
I found a curious issue regarding the mail size.
If someone had noticed the same and knows how and why then a explanation would
be nice.
The issue:
Mail size with no attachment is about 3,17 KB (3.251 Bytes)
Mail with attachment 38,1 MB (38075102 Bytes) where the attached file is only
27,8 MB (27820247 Bytes) Why is mail with attachment this big and is even bigger
when the attachment is bigger? I calculated 30 percent increase of the mail size itself from
the attachment size. This causes problems with mails sending to an SMTP Server with a size limit
from 20 Mb not able to send a 19 Mb file cause the mail itself is even larger. This is only in
Thunderbird and Fossamail.
If someone had noticed the same and knows how and why then a explanation would
be nice.
The issue:
Mail size with no attachment is about 3,17 KB (3.251 Bytes)
Mail with attachment 38,1 MB (38075102 Bytes) where the attached file is only
27,8 MB (27820247 Bytes) Why is mail with attachment this big and is even bigger
when the attachment is bigger? I calculated 30 percent increase of the mail size itself from
the attachment size. This causes problems with mails sending to an SMTP Server with a size limit
from 20 Mb not able to send a 19 Mb file cause the mail itself is even larger. This is only in
Thunderbird and Fossamail.
-
Moonchild
- Project founder

- Posts: 39080
- Joined: 2011-08-28, 17:27
- Location: Sweden
Re: Mail size issue with attachment in Fossa and thunderbird
No, this is not only in FossaMail and Thunderbird.
Mail Size is the size of the raw mail as-sent. Binary attachments (anything but text) are normally sent base64-encoded, which has a 4:3 overhead (3 bytes of the original file become 4 bytes in the mail). This is rooted in the fact that e-mail is a text-only based medium and cannot support 8-bit content natively.
Mail Size is the size of the raw mail as-sent. Binary attachments (anything but text) are normally sent base64-encoded, which has a 4:3 overhead (3 bytes of the original file become 4 bytes in the mail). This is rooted in the fact that e-mail is a text-only based medium and cannot support 8-bit content natively.
"There is no point in arguing with an idiot, because then you're both idiots." - Anonymous
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite
-
neozimpi
Re: Mail size issue with attachment in Fossa and thunderbird
Thanks this confirms my hypothesis.
Any way to avoid it or change it on mail server using amavis or some other filter system?
Any way to avoid it or change it on mail server using amavis or some other filter system?
-
squarefractal
Re: Mail size issue with attachment in Fossa and thunderbird
So, no.Moonchild wrote:This is rooted in the fact that e-mail is a text-only based medium and cannot support 8-bit content natively.
Of course, compression is of help here but I assume that you've already tried that.
-
neozimpi
Re: Mail size issue with attachment in Fossa and thunderbird
Thanks
yeah compression is on.
So this "problem" concerns everybody with big attachments.
Someone has to build a new mail system.

yeah compression is on.
So this "problem" concerns everybody with big attachments.
Someone has to build a new mail system.
-
Moonchild
- Project founder

- Posts: 39080
- Joined: 2011-08-28, 17:27
- Location: Sweden
Re: Mail size issue with attachment in Fossa and thunderbird
Actually, I already found a partial solution for this a decade or 2 ago. I've never really published it though, so obviously there has been absolutely 0 adoption of my idea 
"There is no point in arguing with an idiot, because then you're both idiots." - Anonymous
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite
-
squarefractal
Re: Mail size issue with attachment in Fossa and thunderbird
Are you be willing to publish your solution at this point in time? Would be interesting to have a look at.Moonchild wrote:I've never really published it though
-
Moonchild
- Project founder

- Posts: 39080
- Joined: 2011-08-28, 17:27
- Location: Sweden
Re: Mail size issue with attachment in Fossa and thunderbird
Oh the idea was really simple, actually.
As an extension to the BASE64 index, 24 additional characters would be used, coming to 88 (still within the safe range for lower range ASCII).
These additional characters would not be used to do the 4:3 hashing, but would rather be used after this hashing was done to run-length compress the resulting character string.
Threshold would be three equal characters or more in the BASE64 to be replaced with an extra character meaning the count of characters, followed by the character that was compressed, resulting in a 2-character string representing 3-27 sequences of the same character.
Example, if {=5 }=6 and [=7
Bin -> AAAAABBBBBBBCCCCCCDDDDD -> {A[B}C{D
...reducing 23 characters to 8 characters.
Yes, this has the potential of reducing the encoded file to be smaller than the original binary file
Why 3 or more? Because anything less would have no net gain.
The second part of the idea I incorporated was making use of the very common bias for higher/lower order bits to be present in source files (if they aren't compressed files already themselves -- but even then the header usually isn't so can be reduced in size), so instead of using straight-up BASE64 encoding, the string would be "bitsorted", grouping the resulting characters together by order:
BASE64: Bin -> ABCDABCDABCDABCDABCD
Bitsorted: Bin -> AAAAABBBBBCCCCCDDDDD
Written in TP at the time, here are the relevant code snippets (these aren't the latest and using the compression string inefficiently, only up to 24, but I can't find all my 1996 files at the moment ;P ):
Compression:
Where "CompCodeString" is a string holding the extra characters.
Botsorting+encoding:
Checksum() is an extra thing I built into my encoder to verify integrity. -- can be ignored.
MSCode_Lines is the number of lines per checksummed block, not really relevant for the actual encoding
MSCode_LineWidth is the line width used in the base64 hashing (default 57 bytes, leading to 76 characters hashed)
Hash3() is the actual standard base64 hashing routine
As an extension to the BASE64 index, 24 additional characters would be used, coming to 88 (still within the safe range for lower range ASCII).
These additional characters would not be used to do the 4:3 hashing, but would rather be used after this hashing was done to run-length compress the resulting character string.
Threshold would be three equal characters or more in the BASE64 to be replaced with an extra character meaning the count of characters, followed by the character that was compressed, resulting in a 2-character string representing 3-27 sequences of the same character.
Example, if {=5 }=6 and [=7
Bin -> AAAAABBBBBBBCCCCCCDDDDD -> {A[B}C{D
...reducing 23 characters to 8 characters.
Yes, this has the potential of reducing the encoded file to be smaller than the original binary file
Why 3 or more? Because anything less would have no net gain.
The second part of the idea I incorporated was making use of the very common bias for higher/lower order bits to be present in source files (if they aren't compressed files already themselves -- but even then the header usually isn't so can be reduced in size), so instead of using straight-up BASE64 encoding, the string would be "bitsorted", grouping the resulting characters together by order:
BASE64: Bin -> ABCDABCDABCDABCDABCD
Bitsorted: Bin -> AAAAABBBBBCCCCCDDDDD
Written in TP at the time, here are the relevant code snippets (these aren't the latest and using the compression string inefficiently, only up to 24, but I can't find all my 1996 files at the moment ;P ):
Compression:
Code: Select all
procedure CompCodedString(var s:string);
var
i,cnt:word;
cmps:string;
cc:char;
begin
cmps:='';
cnt:=0;
for i := 1 to length(s) do
begin
if cnt=0 then {start string}
begin
cc:=s[i];
cnt:=1;
end
else if s[i]=cc then {equal char?}
Inc(cnt)
else if cnt=1 then {just one equal char?}
begin
cmps:=cmps+cc;
cc:=s[i];
end
else
begin
while cnt>25 do {long series?}
begin
cmps:=cmps+CompCodeString[24]+cc;
cnt:=cnt-24;
end;
if cnt=1 then {1 left after?}
cmps:=cmps+cc
else if cnt<>0 then {some left?}
cmps:=cmps+CompCodeString[cnt-1]+cc;
cc:=s[i];
cnt:=1;
end;
end;
{now make sure restvalues are dealt with}
while cnt>25 do
begin
cmps:=cmps+CompCodeString[24]+cc;
cnt:=cnt-24;
end;
if cnt=1 then
cmps:=cmps+cc
else if cnt<>0 then
cmps:=cmps+CompCodeString[cnt-1]+cc;
s:=cmps;
end;Botsorting+encoding:
Code: Select all
procedure CFBBitSortC(bp:BinBlockP; var f:text);
var
cline:String;
l,w:word;
addr:word;
b1,b2,b3:byte;
c1,c2,c3,c4:char;
cl1,cl2,cl3,cl4:String;
begin
addr:=0;
check:=0;
for l:=1 to MSCode_Lines do
begin
cline:='';
cl1:='';
cl2:='';
cl3:='';
cl4:='';
for w:=1 to (MSCode_LineWidth div 3) do
begin
b1:=bp^[addr];
check:=Checksum(check,b1);
Inc(addr);
b2:=bp^[addr];
check:=Checksum(check,b2);
Inc(addr);
b3:=bp^[addr];
check:=Checksum(check,b3);
Inc(addr);
Hash3(b1,b2,b3,c1,c2,c3,c4);
cl1:=cl1+c1;
cl2:=cl2+c2;
cl3:=cl3+c3;
cl4:=cl4+c4;
end;
cline:=cl1+cl2+cl3+cl4;
CompCodedString(cline);
WriteLn(f,cline);
end;
WriteLn(f,check);
WriteLn('Next addr:',addr);
end;MSCode_Lines is the number of lines per checksummed block, not really relevant for the actual encoding
MSCode_LineWidth is the line width used in the base64 hashing (default 57 bytes, leading to 76 characters hashed)
Hash3() is the actual standard base64 hashing routine
"There is no point in arguing with an idiot, because then you're both idiots." - Anonymous
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite