Week 5— Learning Basic Concepts of Cybersecurity
这篇文章介绍了HTTP协议的发展历程及其改进点。从最初的HTTP/0.9到现代的HTTP/3.0,每一代都有显著变化:支持更多方法、优化连接保持、引入二进制分帧及基于UDP的QUIC协议。内容还涉及内容协商机制及工具如Burp Suite的工作原理。 2025-7-21 05:30:25 Author: infosecwriteups.com(查看原文) 阅读量:22 收藏

Aang 🐦‍🔥

Hi there! If you’re wondering who I am, I go by @iamaangx028 on the internet — you can call me Aang :)

I am a student who is trying to get into the cybersecurity field. As a part of that journey, I would like to share my progress with all of you through weekly blogs.

Just a small note for continuity…

So, up to now we have learnt basic Network concepts and Browser concepts. Now that I think, we are good to start with the Web concepts. We will go slowly, one concept at a time. So, if you haven’t read my previous blogs, I highly recommend that you cover those. However, it's still okay if you have the required basics. Let us go!!

Let us start the week — 5

Welcome to the World Wide Web

Let us try to learn the basic concepts of the World Wide Web. The phase will take more time than the network part, the web browsers part. But we will be trying to cover as many concepts as possible. So, let us start by knowing how HTTP evolved.

The way people use the web has changed rapidly, so has the Engineering behind it! We learned that Sir Tim-Bernerlee has laid the foundation for the Hypertext Transfer Protocol and the Hypertext Markup Language.

HTTP/0.9

In 1991, HTTP/0.9 was the first version of HTTP. And it only had the GET method along with the Path for the requested resource. Except that there was nothing to look at. No Headers, no status codes, No Content-Type, nothing!! There was no formal Request For Comments (RFC). It looked as follows:

REQUEST:GET /mypage.html RESPONSE:<html> .... </html>

Usually, if a user wants to access a resource, then a successful TCP 3-way Handshake would happen and then followed by a GET request, and if the server is in the mood to send the response, it will send. Then TCP Teardown would happen. A TCP Teardown is just a way the client and server end the TCP session. So, TCP was established and closed every time for each new request. Which is not efficient.

HTTP/1.0

In 1996, HTTP/1.0 was released, which laid the foundation for the change of the web. HTTP/1.0 supported new HTTP methods like POST, HEAD, and Status codes, Headers like Content-Type, and Content Negotiation between the browser and the server. HTTP/1.0 has significant changes when compared to HTTP/0.9. Then, engineers figured out ways to send other Media types like Image, Video, along with Text. With the use of Content-Type headers in the request and response, both server and client become capable of understanding the MIME type of the file they are dealing with. And interoperability problems were common at that time. And even RFC (Request For Comments) and other documentation were developed and maintained to help other developers understand the Web. Still, it was following a strict Request-Response model, which means, a second request it sent only when the response for the first request arrived. And, even in HTTP/1.0, the TCP session by default tears down immediately after the data transfer of the requested resource. But some say the Connection: Keep-alive header was supported in HTTP/1.0, but not used by default. HTTP/1.0 can support Secure HTTP with the SSL/TLS handshake. So, typically in HTTP/1.0, each request has to go through a TCP 3-way handshake and a TLS handshake. Which is not feasible given how much costlier and valuable the bandwidth was at that time. And I think one of the reasons for not keeping the connection alive indefinitely is not sufficient memory in the computers at that time.

A major change to HTTP occurred at the end of the year 1994. The Netscape company developed an encrypted transmission layer over the basic TCP/IP stack to prevent the MITM attacks. Netscape introduced SSL 1.0 but never released its documentation to the public. Then, later versions of SSL, like SSL 2.0, were published. SSL was first adopted by e-commerce websites. But later, due to the fact that more powerful websites were being developed and sensitive information like PII was needed by websites to provide more and more services, SSL/TLS became a mandatory requirement for all websites.

HTTP/1.1

HTTP/1.1 was released in 1997, only a few months after HTTP/1.0. This version of HTTP was the most widely used. Most of the legacy applications and servers support HTTP/1.1. HTTP/1.1 solves some of the major problems in HTTP/1.0. It solves the immediate TCP teardown after the data transfer is done for the requested resource. In HTTP/1.1, we do not need to perform the 3-way handshake & TLS handshake again and again for every resource requested. The Connection: Keep-Alive is made default in HTTP/1.1. Due to which we can request multiple resources over a single TCP & TLS session. The session or Connection will be alive until a close request is sent or until the time allocated for the session expires (Whichever is first). HTTP/1.1 also supports Pipelining, which we cannot see in the initial versions of HTTP. This pipelining enables clients (like browsers) to send multiple requests even before receiving the response for the first request. This way, the client need not wait for the first request’s response. So, these major improvements enable the browsers to load pages faster than before. But, like Peter’s uncle said, with great power comes great responsibility. The pipelining implementations have some limitations; the responses you get should be in the same order as the requests were sent. So, even if the response-3 for request-3 is ready to come to the browser, if the first or second request’s response is delayed, then response-3 also gets delayed. Sometimes even worse. Given the nature of TCP that which sends an Automatic Repeat Request (ARQ) to try to get the missing piece of data, the remaining responses get blocked. This is what engineers call Head of Line Blocking. So, the developers had to implement this Pipelining with much attention, otherwise the whole browser wouldn’t work. But it is hard to implement this correctly. And some browsers in later years disabled this pipelining for fetching the resources from servers. In HTTP/1.1, the headers are not compressed; instead, the response content is compressed to different formats like Brotli, GZIP, Deflate, or identity. The headers are sent as Text only. Along with these improvements, HTTP/1.1 supports chunked responses. HTTP/1.1 supports different caching mechanisms. Host Headers also helped in hosting different websites on a single IP address of the server, which helps in the efficient use of servers. While a TCP connection can be reused in HTTP/1.1 for requesting multiple resources, the clients still have to wait for the response from the first request, which we already discussed as Head-of-line blocking. As a workaround for this problem, clients would make a maximum of 6 parallel TCP connections to an origin. Which increased the performance efficiency. Likewise, HTTP/1.1 operated extremely stably. But everything needs to update, so was the HTTP/1.1. HTTP/1.1 even had two major revisions each in different years.

If you would like to learn more deeply, feel free to check out this blog

HTTP/2.0

HTTP/2.0 was released in 2015. This version of HTTP is the most used nowadays. It solved the major problems in HTTP/1.1, like head-of-line blocking. HTTP/2.0 is a binary protocol because, unlike HTTP/1.1, it compresses the headers. and sends them in frames in streams. If you think of a National Highway as a TCP connection, these streams are like different lanes in the highway. Each lane facilitates the transfer of the frames to and from the server. This implementation solved the head-of-line blocking problem and increased the efficiency. If you go deeper into what these frames are, frames are actually the Headers, and the data that we send from the client to the server. The headers are encoded using HPACK. And the data (request body) is not necessarily compressed by the client, until you explicitly mention it with Content-Encoding: br, gzip, deflate in the request. And then this data (encoded headers and Request body) is made into the smallest unit of communication — a Frame. There are different frame types like Data FRAME, Header FRAME. These frames are then sent to the server in streams. There could be multiple streams in a TCP connection. Each frame has a stream ID. These frames could be mixed during transmission and put back into the original sequence at the receiver end. And also, we can set the priority to the frames, which are most important for us. HTTP/2.0 also supports server push, meaning it is a technique used by servers to send extra resources along with the requested resource, even if not asked.

Again, just to make things clear, all of this process is taken care of by the Binary Framing layer in the Application layer.

HTTP/3.0

As the websites become more and more complex, HTTP/2.0 shows some limitations. Up to now, all of the HTTP versions have been built on the TCP connection. But due to the nature of TCP connections, Automatic Repeat Request is sent to get the lost frames whenever some frames are missing, which causes some delay in the response. Say, for example, some part of the JavaScript file is missing, then the whole rendering is stopped because the JavaScript file is given more priority by the browser rendering engine, as we learned last week’s blog. So, to avoid such scenarios, Google developed the QUIC protocol, which is built upon the UDP protocol. UDP is a connectionless protocol. QUIC stands for Quick UDP Internet Connection. Basically, HTTP/3 = HTTP + QUIC + UDP. So, given the fact that QUIC is developed on UDP, whenever some frames are lost, that particular stream is blocked while other streams can work effortlessly. Now websites are getting adopted to HTTP/3.0

So, just to make things clearer;

  1. In HTTP/1.1, when you send a request from the browser, the headers are sent in plain text, even though they are highly compressible. Because it is the way it is developed! But the Response body containing the actual HTML/CSS/JS document is sent by the server to the client(browser) in a compressed format (brotli, Gzip, deflate, based on the client's requirements and server availability).
  2. But in HTTP/2, when you send a request from the browser, headers are compressed and are sent to the server in binary format. This makes the server’s job of parsing the headers easy. But, if the request is sent in binary, then how come we are able to see the request in human-readable format in Burp Suite when intercepted? There should be something going around that we don’t know!!!! Yes, you are right to doubt that, when you proxy your browser’s traffic through a Burp Suite, then the browser understands and automatically degrades the HTTP/2 → HTTP/1.1. Which is why we can see the request in human-readable format in the Burp Suite intercept tab. And when you forward it, Burp Suite can upgrade the request to HTTP/2 if the server supports. Or you can even change this setting in the Burp Suite to leave it as HTTP/1.1.
  3. And if you are wondering how Burp Suite can listen to all of your SSL/TLS traffic, then read this: Burp Suite can do this just because you import PortSwigger’s CA certificate to the Trusted Root CA in your browsers. So with this CA certificate, Burp Suite can issue a fake SSL certificate to any website and sign itself. Whenever you try to connect to a secure website, then CONNECT request is sent to the website. But if you have your traffic proxied through your Burp Suite, then Burp Suite will just sign a Fake SSL certificate and send it to the browser. Given the fact that you imported Portswigger’s CA into the Trusted CA, the browser thinks that the Fake SSL certificate is a Legitimate cert. And then Burp Suite can make a connection to the Website, and the website, this time, sends the original SSL certificate, and Burp Suite verifies it against its own trusted CA list. That way, the Burp Suite can place itself in the middle of the Browser (client) and the Website (server). So, typically speaking, for Browsers, Burp Suite is the Server, not the website. And Burp Suite is the client for the Server, not the browser (when you are proxied your traffic through Burp Suite)

Content Negotiation

Content Negotiation is a technique used for accessing resources of a URI (from the server) in the most appropriate representation. Different clients (for example, browsers, Postman, and CURL) have different ways and requirements for rendering the resource. In other words, Firefox browser expects a different representation of resources for the Desktop view and the Mobile view. A Representation is nothing but, same resource in a different format. So, the clients ask the server to send a representation of a resource that suits the client the best. Then the server identifies the client request and tries to get the most appropriate representation client asked for, and sends it to the client. The way the client asks what format it wants is by specifying the Accept: and Accept-*:headers. An example is given below;

Accept: text/*
Accept-Encoding: br, gzip, deflate
Accept-Language: en-US
Accept-CH:

# In Accept-CH, CH means client hints. It is used for giving info about the device. Like Memory, viewport width, and width in the headers.
#Sometimes, UA is used. But it is not a best practice.

User-Agent:

#Response Headr

Vary:

There are two ways in which content negotiation happens;

  1. Server-driven ( Proactive )
  2. Agent-driven ( Reactive )

In Server-driven, the client specifies what it is expecting by providing the desired values in the headers. Then the server uses those hints and uses the internal algorithm and try to get the best appropriate representation of the request resource. That internal algorithm is server-specific, not standard. Sometimes, the server is not able to find the requested type of representation. Then, the server may send the 406 Not Acceptable or 415 Unsupported Media Type, or sometimes send the default format based on how the server is configured. Server sends Vary header in the response indicating which server is sending back to the client, so that the client can understand and process accordingly.

In Agent-driven, the server sends all available 300 multiple-choice Choices. Then the client chooses anyone from the given list. Then the server sends back the chosen representation of the resource.

But most of the time, a server-driven way is used for content negotiation.

Resources: Overall, the resources that I used to learn and I feel worth mentioning are these: This MDN Doc’s HTTP Evolution here for an in-depth understanding! This YouTube video also helped me to understand the concepts.

Some Final Chit-Chats

Ahh, that’s a wrap for this week! And yeah, I feel like we covered very little this week. These topics are very important, so I have taken more time learning and understanding each topic by asking questions like what, why, and even sometimes what if. So it took a lot of time. I even spent two days travelling on some personal work. So, the productivity for this week has decreased a lot! Hope to regain the momentum from tomorrow. And of course, we have missed some important topics. Will try to cover those in the upcoming weeks. So yeah, see you next week! Have a productive week!

Got feedback, corrections, or cool ideas for start learning Web-related concepts. Hit me up on X if you have something interesting to discuss!


文章来源: https://infosecwriteups.com/week-5-learning-basic-concepts-of-cybersecurity-ae310b92ab71?source=rss----7b722bfd1b8d--bug_bounty
如有侵权请联系:admin#unsafe.sh