Item1498: viewfile corrupts binary files that don't have known extensions
Priority: Enhancement
Current State: No Action Required
Released In: n/a
Target Release: n/a
Applies To: Engine
Component: viewfile
Branches:
After changing to rewrite pub file access to viewfile, we've been getting occasional reports of corrupted files. The viewfile script assumes that a file is mime type
text/plain
and then changes type to match the suffix in a provided file of well known suffixes in data/mime.types
When Apache is configured to rewrite pub access to viewfile, we begin to see file corruption on downloaded files.
To recreate the problem, take a file attachment, such as
pub/System/FileAttachment/Smile.gif
and copy it to
pub/System/FileAttachment/Smile
Access the file
Smile
directly - Apache "mime magic" recognizes the file type and serves it as
Content-Type: image/gif
Access the same file using the viewfile script and the file is served as
Content-Type: text/plain; charset=ISO-8859-1
(Note that Smile doesn't actually get corrupted. Konqueror browser does detect a binary file with plain text encoding and generates a pop-up with corrupted content warning.)
I suspect that a default of Application/Octet-stream might be a safer default for unknown file types. A preferable longer term solution would be to do Apache style filetype magic, but that would probably add more overhead to an already slow process.
--
GeorgeClark - 23 Apr 2009
We could use some perl modules to do that, such as:
CPAN:File::MMagic or
CPAN:File::MMagic::XS for speed, which are aimed to be rewrites of Apache's mod_mime_magic.
I'm strongly against defaulting to something else that
text/plain
for security reasons.
--
OlivierRaginel - 23 Apr 2009
I suppose the right solution is to "do as Apache does" but as long as the mime type doesn't trigger browser execution of the file, I don't understand the security implications. I don't have a strong opinion either way, but I'd like to better understand. If I name the file as ".bin" then it comes down as an octet stream and results in a browser "Save file as" dialog.
It does seem to vary by browser however. Firefox seems to examine the signature of the file to determine it's type at least somewhat independently of the mime type and will display the file if it's a known type.
--
GeorgeClark - 23 Apr 2009
All popular browsers perform some degree of file sniffing -- in particular to detect that a file is of a particular graphics file format. IE is of course infamous for sniffing files served as text/plain and determining that they are HTML. I agree with George that application/octet-stream shouldn't be any less secure than text/plain -- no browser that pays the slightest amount of attention to security should be trying to arbitrarily execute an application/octet-stream file.
At least in a public installation, I think having file sniffing would increase the opportunities for attacks, so if this feature were to be added, it should be a configurable one. (I'm not terribly in favour of it (though I realize it fits in with a user-friendly "do what I mean" philosophy) -- I think the user needs to have the power (and as a result, the responsibility) to properly tag their content.)
--
IsaacLin - 23 Apr 2009
If a document gets "text" converted that should not be the result is disaster.
Binary must be default IMHO.
Where is it coded to default to text?
--
KennethLavrsen - 23 Apr 2009
Subroutine _suffixToMimeType in /lib/Foswiki/UI/Viewfile.pm
Sets mimetype to text/plain and then overrides it to the type detected by the file suffix. No suffix or unknown suffix, results in text/plain.
There is probably some performance optimization that could be done here as well. It reads the
MimeTypesFileName for every attachment. So on any page with lots of embedded files, it gets read for each file. The
MimeTypes could probably be cached by the session for some reduced I/O.
--
GeorgeClark - 23 Apr 2009
Apache - with mod_mime and mod_mime_magic installed
- Attempts to determine mime type from suffix (mod_mime)
- If type not determined, examine file magic to try to determine type (mod_mime_magic)
Here is what Apache docs say about the matter:
http://httpd.apache.org/docs/2.1/mod/core.html#defaulttype
In cases where it can neither be determined by the server nor the administrator (e.g. a proxy), it is preferable to omit the MIME type altogether rather than provide information that may be false. This can be accomplished using
DefaultType None
However if not coded, the default for
DefaultType is indeed
text/plain
So the default used by viewfile appears to be consistent with the Apache configuration when mod_mime_magic is not installed. But the Apache recommendation is to not provide any type vs. an incorrect type.
So this should probably be configurable. And if viewfile is going to become more important, then something equivalent to mod_mime_magic is probably needed as well. Changing to an enhancement request.
Here are some other references:
--
GeorgeClark - 24 Apr 2009
Item1802 related? Is "tgz" unknown to foswiki.org?
--
OliverKrueger - 09 Jul 2009
Not sure - checking the mime.types file, tgz is a covered mime type. Also followed the steps to recreate on
Item1802 - with wireshark, the file was downloaded as type application/x-gzip which is correct.
I don't believe that this is the issue.
--
GeorgeClark - 09 Jul 2009
I wonder if on linux (or wherever its supported) we could use
file --mime-type
(or the library it uses) to improve our hit rate.
--
SvenDowideit - 17 Nov 2009
I have not seen any further reports of corrupted files. Changing this to no action required.
--
GeorgeClark - 07 Mar 2011