HTTP API Authentication and Authorization Techniques [Reprinted]

Original: Chen Hao - https://coolshell.cn/articles/19395.html

We know that HTTP is stateless, so when we need to determine whether a user is logged in, we need to check the user's login status. Generally speaking, after a user successfully logs in, the server will issue a login credential (also called a Token), just like when you visit a company and, after being verified at the front desk, you receive a visitor badge. After that, you can use this visitor badge to access any part of the company without verifying who you are. In the world of computers, the relevant data of this login credential is stored in two places: one is on the user side, in the form of a Cookie (generally not stored in the browser's Local Storage, as this can easily lead to the login credential being attacked by XSS), and the other is stored on the server side, also known as a Session (SessionID is stored in the Cookie). However, this world is still quite complex; in addition to user access, there are also third-party applications delegated by users, as well as calls between enterprises. Here, I would like to systematically summarize some commonly used API authentication technologies in the industry, so that everyone can have a more comprehensive understanding of these technologies.

HTTP Basic#

HTTP Basic is a very traditional API authentication technology and is relatively simple. This technology uses a username and password for login. The entire process is defined in RFC 2617 and is also described in the Wikipedia: Basic Access Authentication entry, and you can also refer to MDN HTTP Authentication.

The technical principle is as follows:

Combine the username and password into the format username (separated by a colon).
Perform Base64 encoding. Base64("username:password") results in a string (for example, encoding haoel in base64 yields aGFvZW86Y29vbHNoZWxsCg).
Place aGFvZW86Y29vbHNoZWxsCg in the HTTP header's Authorization field, forming Authorization: Basic aGFvZW86Y29vbHNoZWxsCg, and then send it to the server.
If the server does not see the authentication field in the header, it returns a 401 error, along with a header like WWW-Authenticate: Basic Realm='HelloWorld' requesting the client to authenticate. If authentication fails, it returns a 401 error. If the server authenticates successfully, it will return a 200.

We can see that the purpose of using Base64 is simply to eliminate some special characters so that they can be transmitted in the HTTP protocol. The biggest problem with this method is that it transmits the username and password over the network, so it is generally used in conjunction with TLS/SSL for secure encryption. We can see that JIRA Cloud's API authentication supports HTTP Basic authentication.

However, we still need to know that transmitting both the username and password over the public internet is not ideal because Base64 is not an encryption protocol but an encoding protocol. Therefore, even with HTTPS as a security measure, it still feels insecure.

Digest Access#

Known in Chinese as "HTTP Digest Authentication," it was initially defined in RFC 2069 (later enhanced with a series of security options in RFC 2617; "quality of protection" (qop), a nonce counter incremented by the client, and a client-generated nonce).

The basic idea is that the requester combines the username, password, and realm into an MD5 hash – MD5(username:realm:password) and then sends it to the server, thus avoiding transmitting the username and password over the internet. However, since the username and password rarely change, the MD5 string is also relatively fixed. Therefore, this authentication process introduces two elements: a nonce and qop.

First, the caller initiates a regular HTTP request. For example: GET /coolshell/admin/ HTTP/1.1.
Naturally, the server cannot authenticate, so it returns a 401 error, and the HTTP header's WWW-Authenticate contains the following information:

 WWW-Authenticate: Digest realm="testrealm@host.com",
                        qop="auth,auth-int",
                        nonce="dcd98b7102dd2f0e8b11d0f600bfb0c093",
                        opaque="5ccc069c403ebaf9f0171e9517f40e41"

The nonce is a random number generated by the server. The client then computes HASH1=MD5(MD5(username:realm:password):nonce:cnonce), where cnonce is a random number generated by the client, ensuring that the entire MD5 result is different.
If qop includes auth, then it must also compute HASH2=MD5(method:digestURI), where method is the HTTP request method (GET/POST…), and digestURI is the requested URL.
If qop includes auth-int, then it must compute HASH2=MD5(method:digestURI:MD5(entityBody)), where entityBody is the entire data body of the HTTP request.
Then, obtain response = MD5(HASH1:nonce:nonceCount:cnonce:qop:HASH2). If there is no qop, then response = MD5(HA1:nonce:HA2).
Finally, the client sends the following request to the server—note the HTTP header's Authorization: Digest ...

GET /dir/index.html HTTP/1.0
Host: localhost
Authorization: Digest username="Mufasa",
                     realm="testrealm@host.com",
                     nonce="dcd98b7102dd2f0e8b11d0f600bfb0c093",
                     uri="%2Fcoolshell%2Fadmin",
                     qop=auth,
                     nc=00000001,
                     cnonce="0a4f113b",
                     response="6629fae49393a05397450978507c4ef1",
                     opaque="5ccc069c403ebaf9f0171e9517f40e41"

The Wikipedia: Digest access authentication entry describes this detail very thoroughly.

Digest authentication is better than the previous method because it does not transmit the user's password over the internet; it only sends the MD5 hash of the password, making it relatively secure. Moreover, it does not require a TLS/SSL secure link. However, despite the complexity of this algorithm, you can find that the key element in the entire process is the user's password. If this password is not complex enough, it can be brute-forced, and the entire process is very susceptible to man-in-the-middle attacks—for example, a man-in-the-middle could tell the client to use Basic authentication or an outdated signature authentication method (RFC2069).

App Secret Key + HMAC#

First, let's talk about HMAC technology, which comes from MAC – Message Authentication Code. It is a technique used to sign messages, meaning we are concerned that messages may be altered during transmission. Therefore, we need to apply a MAC algorithm to the message to obtain a digest string. Then, upon receiving the message, the recipient performs the same calculation and compares the MAC string. If they match, it indicates that the message has not been altered (the entire process is illustrated in the diagram below). HMAC – Hash-based Authentication Code refers to using hash techniques to accomplish this task, such as the SHA-256 algorithm.

Now let's talk about App ID. This is unrelated to authentication; it is merely used for distinction, like an ID card for each person, used to identify different individuals rather than for identity verification. Unlike the previous methods, here we need to use the App ID to map to an encryption key, allowing us to manage related operations on the server side. We can generate several key pairs (AppID, AppSecret) and have finer-grained permission management.

Using AppID and HMAC for API authentication, AWS is currently the most proficient and professional in this regard. We can see how AWS operates through the S3 API request signing documentation. The entire process is quite complex, but we can get a general idea from the flow in the image below. Essentially, it consists of the following steps:

Package the HTTP request (method, URI, query string, headers, signature headers, body) into a CanonicalRequest, create a SHA-256 signature, and then perform a base16 encoding.
Package the signature from the previous step along with the signing algorithm AWS4-HMAC-SHA256, timestamp, and Scope into a StringToSign.
Prepare the signing process, using AWSSecretAccessKey to sign a DataKey for the date, then use the DataKey to sign a DataRegionKey for the region to be operated on, and finally sign a DataRegionServiceKey for the relevant service, resulting in a SigningKey.
Use the SigningKey from the third step to sign the StringToSign from the second step.

Finally, when sending the HTTP Request, include the following information in the Authorization field of the HTTP header:

Authorization: AWS4-HMAC-SHA256 
               Credential=AKIDEXAMPLE/20150830/us-east-1/iam/aws4_request, 
               SignedHeaders=content-type;host;x-amz-date, 
               Signature=5d672d79c15b13162d9279b0855cfba6789a8edb4c82c400e06b5924a6f2b5d7

Here, AKIDEXAMPLE is the AWS Access Key ID, which is the so-called AppID. The server will look up the relevant Secret Access Key based on this AppID and then verify the signature. If you find this process a bit confusing, you can read this article—"Amazon S3 Rest API with curl," which contains several code examples that should be the most detailed and accurate.

The advantage of this authentication method is that the AppID and AppSecretKey are issued by the server system, so they can be managed. AWS's IAM is related to this management, overseeing users, permissions, and their corresponding AppID and AppSecretKey. However, the downside is that there is no standard for this, so implementations vary widely. For example: Acquia's HMAC, WeChat's signature algorithm (here, we need to clarify that WeChat's API does not follow the HTTP protocol standard by placing authentication information in the HTTP header's Authorization but rather in the body).

JWT – JSON Web Tokens#

JWT is a relatively standard authentication solution, and this technology is widely used in the Java community. JWT signing is also a form of MAC. The JWT signing process generally follows these steps:

The user requests authentication from the authentication server using their username and password.
After the authentication server verifies the username and password, it generates a JWT Token on the server side. The token generation process is as follows:
- The authentication server also generates a Secret Key.
- Base64 encode the JWT Header and JWT Payload. The Payload may include the user's abstract ID and expiration time.
- Sign the JWT using the key: HMAC-SHA256(SecertKey, Base64UrlEncode(JWT-Header)+'.'+Base64UrlEncode(JWT-Payload));
The client receives the base64(header).base64(payload).signature as the JWT token.
The client uses the JWT Token to send relevant requests to the application server. This JWT Token acts like a temporary user credential.

When the application server receives the request:

The application service checks the JWT Token to confirm that the signature is correct.
However, since only the authentication server has the user's Secret Key, the application server must send the JWT Token to the authentication server.
The authentication server decodes the user's abstract ID from the JWT Payload, then looks up the Secret Key generated during login using the abstract ID, and checks the signature.
Once the authentication server verifies it, the application service can consider this a legitimate request.

As we can see, the above process dynamically generates a Secret Key for the user on the authentication server, and the application service needs to verify the signature by contacting the authentication server. This process adds some network calls, so JWT supports not only HMAC-SHA256 algorithms but also RSA asymmetric encryption algorithms.

Using RSA asymmetric algorithms, the authentication server holds a private key, while the application server holds a public key. The authentication server uses the private key for encryption, and the application server uses the public key for decryption. This way, the application server does not need to request the authentication server, but RSA is a slow algorithm. Therefore, although you save on network calls, it consumes CPU, especially when the Header and Payload are relatively long. A better approach is to perform a simple SHA256 on the header and payload, which is fast, and then use RSA to encrypt the resulting SHA256 string. This way, the RSA algorithm becomes faster while achieving the purpose of using RSA signing. Finally, we only need to use a mechanism to periodically exchange the public and private key pairs between the authentication server and the application server.

It is strongly recommended to read the entire article "JSW: The Complete Guide to JSON Web Tokens" from Angular University.

OAuth 1.0#

OAuth is also an API authentication protocol. This protocol was initially discovered in 2006 by Twitter engineers while developing OpenID and working with the social bookmarking site Ma.gnolia, realizing that there was no good delegated authorization protocol. Later, in 2007, an OAuth group was formed, and upon hearing this news, Google employees joined in to improve the protocol. By the end of 2007, a draft was released, and a year later, in 2008, OAuth was submitted to the IETF for further standardization work. Finally, in April 2010, OAuth 1.0 was officially released as RFC 5849 (this RFC is much easier to read compared to those for TCP). However, if you want to understand its predecessor draft, you can read OAuth Core 1.0 Revision A. Below is a brief description.

According to RFC 5849, the purpose of OAuth's emergence was to allow users to use a third-party web printing service to print their photos from a certain website without wanting to share their username and password with that third-party service while still allowing that service to access their photos. To solve this authorization problem, the OAuth protocol was created.

This protocol has three roles:
- User (photo owner)
- Consumer (third-party photo printing service)
- Service Provider (photo storage service)
This protocol has three stages:
- Consumer obtains Request Token
- Service Provider authenticates the user and authorizes the Consumer
- Consumer obtains Access Token to call the API to access the user's photos

The entire authorization process is as follows:

The Consumer (third-party photo printing service) first needs to obtain a Consumer Key and Consumer Secret from the Service Provider.
When the User accesses the Consumer, the Consumer requests a Request Token from the Service Provider (which requires signing the HTTP request).
After verifying that the Consumer is a registered third-party service provider, the Service Provider returns the Request Token (oauth_token) and Request Token Secret (oauth_token_secret).
After receiving the Request Token, the Consumer uses an HTTP GET request to redirect the User to the Service Provider's authentication page (including the Request Token), prompting the user to enter their username and password.
After successfully authenticating the User, the Service Provider redirects back to the Consumer, returning the Request Token (oauth_token) and Verification Code (oauth_verifier).
Next, the Consumer signs the request to exchange the Request Token and Verification Code for an Access Token (oauth_token) and Access Token Secret (oauth_token_secret).
Finally, the Consumer uses the Access Token to access the resources authorized by the user.

The following diagram from Yahoo! illustrates the relevant details of the entire process.

Because the above process involves three parties: User, Consumer, and Service Provider, it is also called a 3-legged flow. OAuth 1.0 also has a version that does not require user participation, involving only the Consumer and Service Provider, known as a 2-legged flow, which omits user authentication. The entire process is as follows:

The Consumer (third-party photo printing service) first needs to obtain a Consumer Key and Consumer Secret from the Service Provider.
The Consumer requests a Request Token from the Service Provider (which requires signing the HTTP request).
After verifying that the Consumer is a registered third-party service provider, the Service Provider returns the Request Token (oauth_token) and Request Token Secret (oauth_token_secret).
After receiving the Request Token, the Consumer directly exchanges it for an Access Token (oauth_token) and Access Token Secret (oauth_token_secret).
Finally, the Consumer uses the Access Token to access the resources authorized by the user.

Finally, let's discuss the signing in OAuth.

We can see that there are two keys: one is the Consumer Secret issued by the Provider when the Consumer registers with the Service Provider, and the other is the Token Secret.
The signing key is formed by concatenating these two keys, using & as the connector. For example, if the Consumer Secret is j49sk3j29djd and the Token Secret is dh893hdasih9, the signing key would be: j49sk3j29djd&dh893hdasih9.
When requesting Request/Access Tokens, the entire HTTP request needs to be signed (using HMAC-SHA1 and HMAC-RSA1 signing algorithms), and the request header must include some OAuth-required fields, such as:
- Consumer Key: also known as AppID
- Token: Request Token or Access Token
- Signature Method: signing algorithm, e.g., HMAC-SHA1
- Timestamp: expiration time
- Nonce: random string
- Call Back: callback URL

The following diagram illustrates the entire signing process:

OAuth 2.0#

From the previous discussion, we can see that from Digest Access to AppID + HMAC, then to JWT, and finally to OAuth 1.0, all these API authentications require sending a key (or using a password) to the Client and then signing the HTTP request using HASH or RSA. One major reason for this is that earlier HTTP transmitted data in plaintext, making it easy to tamper with during transmission. Thus, a security signature mechanism was developed, allowing these authentication methods to work under the HTTP plaintext protocol.

This use of signing methods can be seen as quite complex, making it unfriendly for developers when organizing those HTTP messages, with various URL encodings and Base64, needing to sort query parameters, and some methods requiring multiple layers of signing, which is prone to errors. Additionally, the security granularity of this authentication is relatively coarse, and the authorization is quite singular, which is somewhat inadequate for mobile endpoints involving end users. Therefore, in 2012, OAuth 2.0's RFC 6749 was officially released.

OAuth 2.0 relies on TLS/SSL link encryption technology (HTTPS), completely abandoning the signing method. The authentication server no longer returns any token secret keys, making OAuth 2.0 fundamentally different from and incompatible with 1.0. Currently, Facebook's Graph API only supports the OAuth 2.0 protocol, and Google and Microsoft Azure also support OAuth 2.0. Domestic services like WeChat and Alipay also support using OAuth 2.0.

Now, let's focus on two main flows of OAuth 2.0:

One is the Authorization Code Flow, which is 3-legged.
The other is the Client Credential Flow, which is 2-legged.

Authorization Code Flow#

Authorization Code is the most commonly used authorization grant type in OAuth 2.0, suitable for scenarios where users grant third-party applications access to their information. This flow is also the most complete among the four flows in OAuth 2.0, as illustrated in the flowchart below.

Here is a detailed explanation of this flow:

When the user (Resource Owner) accesses the third-party application (Client), the third-party application redirects the user to the authentication server (Authorization Server), primarily requesting the /authorize API, with the request formatted as follows:

https://login.authorization-server.com/authorize?
        client_id=6731de76-14a6-49ae-97bc-6eba6914391e
        &response_type=code
        &redirect_uri=http%3A%2F%2Fexample-client.com%2Fcallback%2F
        &scope=read
        &state=xcoiv98CoolShell3kch

Where:

client_id is the App ID of the third-party application.
response_type=code indicates to the authentication server that we want to use the Authorization Code Flow.
redirect_uri indicates the URL to redirect back to the third-party application.
scope indicates the relevant permissions.
state is a random string used primarily to prevent CSRF attacks.

When the Authorization Server receives this URL request, it checks the redirect_uri and scope against the client_id for validity. If valid, it displays a page for the user to authorize (if the user is not logged in, they will first be prompted to log in, and upon completion, the authorization access page will appear).
After the user agrees to grant access, the Authorization Server redirects back to the Client, including an Authorization Code. For example:

https://example-client.com/callback?
        code=Yzk5ZDczMzRlNDEwYlrEqdFSBzjqfTG
        &state=xcoiv98CoolShell3kch

Next, the Client can use the Authorization Code to obtain an Access Token. It needs to send the following request to the Authorization Server:

POST /oauth/token HTTP/1.1
Host: authorization-server.com
 
code=Yzk5ZDczMzRlNDEwYlrEqdFSBzjqfTG
&grant_type=code
&redirect_uri=https%3A%2F%2Fexample-client.com%2Fcallback%2F
&client_id=6731de76-14a6-49ae-97bc-6eba6914391e
&client_secret=JqQX2PNo9bpM0uEihUPzyrh

If everything is fine, the Authorization Server will return the following information:

{
  "access_token": "iJKV1QiLCJhbGciOiJSUzI1NiI",
  "refresh_token": "1KaPlrEqdFSBzjqfTGAMxZGU",
  "token_type": "bearer",
  "expires": 3600,
  "id_token": "eyJ0eXAiOiJKV1QiLCJhbGciO.eyJhdWQiOiIyZDRkM..."
}

Where:

access_token is the access request token.
refresh_token is used to refresh the access_token.
id_token is the JWT token, which generally contains the user's OpenID.

Next, the Client uses the Access Token to request the user's resources:

GET /v1/user/pictures
Host: https://example.resource.com
Authorization: Bearer iJKV1QiLCJhbGciOiJSUzI1NiI

Client Credential Flow#

Client Credential is a simplified version of API authentication, primarily used for server-to-server calls, meaning there is no user involvement in the authentication process. Below is the relevant flowchart.

This process is very simple; essentially, the Client requests an Access Token from the Authorization Server using its client_id and client_secret, and then uses the Access Token to access the relevant resources.

Request example:

POST /token HTTP/1.1
Host: server.example.com
Content-Type: application/x-www-form-urlencoded
grant_type=client_credentials
&client_id=czZCaGRSa3F0Mzpn
&client_secret=7Fjfp0ZBr1KtDRbnfVdmIw

Response example:

{
  "access_token":"MTQ0NjJkZmQ5OTM2NDE1ZTZjNGZmZjI3",
  "token_type":"bearer",
  "expires_in":3600,
  "refresh_token":"IwOGYzYTlmM2YxOTQ5MGE3YmNmMDFkNTVk",
  "scope":"create"
}

Summary#

Two Terms and Three Concepts#

Distinguish between two terms: Authentication (verification of identity) and Authorization (granting permissions). The former proves the identity of the requester, like an ID card, while the latter is about obtaining permissions. Identity is proof that distinguishes one from another, while permissions prove one's privileges. Authentication requires providing a password, SMS verification code, or even facial recognition. Authorization does not require verifying the identity in every request; after obtaining authorization, a Token is issued, which is Authorization, similar to a passport and visa.
Distinguish between three concepts: encoding Base64Encode, signing HMAC, and encryption RSA. Base64 encoding is for better transmission (no unusual characters, can transmit binary files), equivalent to plaintext; HMAC signing is to prevent information tampering, while RSA encryption is to keep the information hidden.

Understanding Some Intentions#

The use of complex HMAC hash signing methods primarily addresses the situation when there was no TLS/SSL encrypted link.
The purpose of placing the uid in the Token in JWT is to eliminate state while preventing users from modifying it, hence the need for signing.
OAuth 1.0 distinguishes between two entities: the third-party Client and the actual user, where the method of first obtaining a Request Token and then exchanging it for an Access Token is primarily to differentiate between third-party applications and users.
The user's Password is set by the user, and its complexity is uncontrollable, while the Secret issued by the server can be complex but is primarily for easier management and can be revoked at any time.
The OAuth protocol offers more flexible and comprehensive configurations than all authentication protocols. If you want to use AppID/AppSecret signing while allowing for different permissions and revocation at any time, you would need to develop a system like AWS's IAM for managing accounts and key pairs.

Regardless of the method, we should adhere to HTTP standards and place authentication information in the Authorization HTTP header.
Avoid using GET requests to place secrets in the URL, as many proxy or gateway software will log the entire URL in access log files.
The Secret key is equivalent to a Password, but it is used for encryption. It is best not to transmit it over the network; if transmission is necessary, use a TLS/SSL secure link.
HMAC calculations, whether MD5 or SHA1/SHA2, are very fast, while RSA asymmetric encryption is CPU-intensive, especially when encrypting long strings.
Avoid hardcoding your Secret in the program, as many hacker tools on GitHub monitor various Secrets. Be very cautious! Such information should be placed in your configuration or deployment system, set in configuration files or environment variables at program startup.
Whether using AppID/AppSecret, OAuth 1.0, OAuth 2.0, or JWT, I personally recommend using OAuth 2.0 under TLS/SSL.
Keys need to be managed, meaning they can be added, revoked, and associated with accounts and permissions. Ideally, keys should be automatically changeable.
It is best to separate the authentication authorization server (Authorization Server) from the application server (App Server).