Django – Uploaded file type validation

I need to validate the file type of the uploaded file and should allow only pdf, plain test and MS word files. Here is my model and and the form with validation function. But, I’m able to upload files even without the extension.

class Section(models.Model):
    content = models.FileField(upload_to="documents")

class SectionForm(forms.ModelForm):
    class Meta:
        model = Section
    FILE_EXT_WHITELIST = ['pdf','text','msword']

    def clean_content(self):
        content = self.cleaned_data['content']
        if content:
            file_type = content.content_type.split('/')[0]
            print file_type
            if len(content.name.split('.')) == 1:
                raise forms.ValidationError("File type is not supported.")
            if content.name.split('.')[-1] in self.FILE_EXT_WHITELIST:
                return content
            else:
                raise forms.ValidationError("Only '.txt' and '.pdf' files are allowed.")

Here is the view,

def section_update(request, object_id):
    section = models.Section.objects.get(pk=object_id)
    if 'content' in request.FILES:
            if request.FILES['content'].name.split('.')[-1] == "pdf":
                content_file = ContentFile(request.FILES['content'].read())
                content_type = "pdf"
                section.content.save("test"+'.'+content_type , content_file)
                section.save()

In my view, I’m just saving the file from the request.FILE. I thought while save() it’ll call the clean_content and do content-type validation. I guess, the clean_content is not at all calling for validation.

Answer

You approach will not work: As an attacker, I could simply forge the HTML header to send you anything with the mime type text/plain.

The correct solution is to use a tool like file(1) on Unix to examine the content of the file to determine what it is. Note that there is no good way to know whether something is really plain text. If the file is saved in 16 bit Unicode, the “plain text” can even contain 0 bytes.

See this question for options how to do this: How to find the mime type of a file in python?