Multipart/form-data Parsing

About programming and getting involved with Linux Mint development
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Locked
User avatar
bc-os
Level 2
Level 2
Posts: 90
Joined: Mon Feb 10, 2020 9:24 pm
Location: Boston

Multipart/form-data Parsing

Post by bc-os »

I'm working on a multipart/form-data parsing library in C. Most of the info for how to handle things I get from RFC7578 and RFC2046.
RFC7578 says that it follows the model laid out in RFC2046, but I'm confused on how closely. RFC2046 (section 5) says that a delimiter line uses the boundary, two dashes "--", and the preceding CRLF as the delimiter. So if I have a boundary of "boundary123", the delimiting line should be:

Code: Select all

\r\n--boundary123
However, this does not appear to apply to the first delimiter. When the form data is submitted, the very first character is the dash, not '\r'. (I mean, it's sensible to send it that way, but it seems inconsistent.)
It also mentions that the delimiter line may end with whitespace as "transport padding" but I have not come across that yet. I don't know if that's relevant to form submissions or not.
The last thing is that it mentions the footer of the data, the "close-delimiter", should consist of the delimiter, plus two more dashes, i.e.:

Code: Select all

\r\n--boundary123--
and that's it - no final newline. However the submitted data also tacks on a CRLF at the end.

I've tried this in Firefox and Chromium so far. They both give identical results with the exception of the choice of boundary.
I'm just wondering if the RFCs are not being followed quite right or am I not understanding things correctly?
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
rene
Level 20
Level 20
Posts: 12240
Joined: Sun Mar 27, 2016 6:58 pm

Re: Multipart/form-data Parsing

Post by rene »

The old (original?) MIME RFC in fact spends some words on the initial CRLF. Section 7.2.1, page 31,

https://datatracker.ietf.org/doc/html/rfc1341
The requirement that the encapsulation boundary begins with a CRLF implies that the body of a multipart entity must itself begin with a CRLF before the first encapsulation line -- that is, if the "preamble" area is not used, the entity headers must be followed by TWO CRLFs. This is indeed how such entities should be composed. A tolerant mail reading program, however, may interpret a body of type multipart that begins with an encapsulation line NOT initiated by a CRLF as also being an encapsulation boundary, but a compliant mail sending program must not generate such entities.
As fas as I can see RFC2046 has indeed dropped that verbiage; seems that in the context of form data you may feel free to still consider it present --- but if you care deeply I would agree this could be cause for an erratum against RFC7578, https://www.rfc-editor.org/errata_searc ... c_status=0 (the additional final CRLF could seemingly just be considered part of the "epilogue" which is to be ignored per RFC so that one should be fine it seems).

Together also with the trailing whitespace one this question seems a matter of the paradigm "be liberal in what you accept, conservative in what you do", https://en.wikipedia.org/wiki/Robustness_principle. I.e., you'd accept missing/stray CRLF and trailing whitespace when incoming, would not generate any when outgoing.
User avatar
bc-os
Level 2
Level 2
Posts: 90
Joined: Mon Feb 10, 2020 9:24 pm
Location: Boston

Re: Multipart/form-data Parsing

Post by bc-os »

rene wrote: Sat Nov 27, 2021 6:50 am The old (original?) MIME RFC in fact spends some words on the initial CRLF. Section 7.2.1, page 31,
Thank you for finding that.

I'm actually dusting off an older project of mine. I started this a couple years ago. But back then, I only skimmed the RFC's to get a general idea as to how they work. I based my delimiter-finding routine on something not quite in line with the spec. Now I have to rewrite more than I care to :?
Locked

Return to “Programming & Development”