This is the first post in a series: “The absolute minimum every Python web application developer must know about security.”

This first section on “security principles” is probably the most important section of the series. There are a few hard and fast rules you can apply for security, like don’t store passwords in plaintext, don’t implement your own security algorithms, but there are also many important principles that must be applied thoughtfully and with judgment and exploration.

Security best practices for Python web application development include, ordered very approximately from the more general principles to the more specific ones

Security is a process, not a product (bake security thinking into every stage and process)
Have a Defence in Depth approach to security with multiple layers of protection
Never implement your own security algorithms
- Use standard, up-to-date and properly configured cryptographic algorithms
- Use open source and well-maintained libraries for security, e.g. OpenSSH
- For standard cryptography algorithms in Python use the cryptography package from the pyca (built on OpenSSH)
Data should be encrypted at rest and in transit
- Even if your systems are breached and your database stolen it should reveal no sensitive information because data is encrypted at rest
  - Use SQLAlchemy StringEncryptedType (from the sqlalchemy-utils package), with properly managed encryption keys, to store data encrypted
  - For Django there is django-encrypted-model-fields
- Even if your systems are breached and attackers snoop on your network traffic it should reveal no sensitive information because data is encrypted in transit
  - Use TLS with certificate verification for network communications, ensuring obsolete versions of TLS are disabled
  - Implement zero-trust architecture
    - don’t assume internal traffic is authenticated or privileged
    - Full application level zero trust architecture can be implemented with frameworks like OpenZiti which use different networking paradigms
Don’t rely on security by obscurity to protect data
- Don’t publicly expose data in S3, use presigned URLs
Be aware of, understand, and mitigate, the OWASP Top Ten Vulnerabilities (several are addressed here)
Use security testing techniques like creating a security requirements document and doing threat modeling
- involve security experts, or gain security expertise, amongst developers and QA teams
Use code review and security testing to find vulnerabilities, use automated tools and pen testing to verify network security
- Many serious security vulnerabilities cannot be detected with any other form of analysis or testing (The OWASP Web Security Testing Guide), so code review and developer understanding are our primary weapons
- Common vulnerabilities (checking authentication is required for your endpoints and that insecure versions of TLS are rejected for example) can be put in security testing frameworks used by several projects
- pen testing is good for verifying network security but poor for finding application vulnerabilities
Always use standard authentication and access controls (provided by a framework or delegated to an identity provider like Azure AD with OAuth2)
- Applications, and therefore permissions, is more complicated than just user/admin
  - Role Based Access Control (RBAC) is a useful model
- Follow the principle of least privilege.
  - Every service and request should have the least authority necessary to perform its function
- Deny by default
  - Only give access to needed entities or data, deny access except where specifically allowed
- Object ownership rather than roles is better for restricting access if possible (single user rather than groups)
- Log all changes to sensitive data (auditability)
- Manage the lifetime of all access tokens (JWTs) and make them short-lived (or follow OAuth 2 guidelines on revoking access)
- Never store passwords in plaintext and use key derivation functions
- Always require 2FA for login
Use tooling and technical solutions for security, including regular updates
- Use tools like pip-audit, bandit, and ruff which warn of security issues, and don’t silence warnings without confirming there is no real risk (pip-audit checks dependencies for CVEs)
  - Use these tools to gate merge requests in source code control as part of your CI pipelines
  - Never bypass code reviews and PR gating mechanisms which include security checks
  - Your source code control systems need to be secure as well as your deployed systems
- Code correctness is a security concern, so testing is an essential part of secure development
- Use uv’s project manage commands (uv run, uv sync, etc) , or tools with similar features like pipenv, which hash dependencies when set and verify on install.
- Use tools like container scanning, intrusion detection, server security plugins (etc) to provide real-time protection and live security alerts
- Use the most recent version of Python and operating systems (LTS versions) possible and keep them up to date
  - Retire and replace end-of-lifetime components as they are no longer secure
  - Periodically audit installed Python packages and remove those that are not being used. Fewer packages means less chance of using a compromised library
  - Pick technologies that will last (community size is a factor in this; liberal license terms will permit maintenance by new entities)
  - Long Term Support versions of operating systems and Python stay secure for longer (if they’re updated)
  - Container scanning can help find insecure components in live systems
- Use virtual environments and containers to isolate components and their dependencies
  - With isolated components breaching or compromising one component won’t automatically compromise other components
  - With isolated dependencies, a compromised dependency only exposes a single component
Minimise externally exposed endpoints and services
- Use network segmentation to isolate sensitive components
Sanitise logging and error outputs for sensitive information
Never hard code secrets in code, always use proper secrets management (such as Helm, Vault or AWS Secrets Manager)
Never directly include external input in queries without sanitization (to protect against injection attacks)
- Using prepared statements for database queries helps prevent SQL injection attacks
- Python-based data validation libraries, including pydantic and the Django form system, can be leveraged to validate incoming data of any format — even file-based data such as CSV and JSON files
- Securing input to LLMs, and validating output, is an emerging field (MLOps: Machine Learning Operations). See e.g. CWE-1426/CWE-1427
Include CORS (Cross Origin Request Sharing) protection and minimize sharing in all web apps
- By default stick to the SOP (Same Origin Policy)
- There is middleware to configure CORS with Flask and aiohttp (etc) where you need Cross Origin Request Sharing
Use CSRF tokens in web forms to protect against Cross Site Request Forgery
- This should be handled for you by your web application framework
- A single-use CSRF token is generated by the server and included as a hidden field in the form. The token must be included in a POST to be valid, or the form submission will be rejected
- CSRF tokens help prevent replay attacks but if implemented badly may be vulnerable to token prediction attacks
Sanitise input to template rendering to protect against XSS (Cross Site Scripting) vulnerabilities (including markdown rendering)
Avoid exposing internal object references (in conjunction with access controls) to help protect against insecure direct object reference (IDOR) attacks
Don’t store or cache secrets on the client, keep them on the server
Don’t use pickle for object serialization, it’s fundamentally insecure (code execution vulnerabilities by design)
XML parsing libraries can use external references and may be insecure without proper configuration
One of the most common causes of security vulnerabilities is memory overflow/underflow issues. So use memory-safe languages like Python and Rust

CVEs and CWEs

Web application security classifies problems as vulnerabilities (specific exploits categorized using CVEs: Common Vulnerabilities and Exposures) and weaknesses (potential exploits tracked using CWEs: Common Weakness Enumeration ). The CVE system was started in 1999 and is funded by the US National Cyber Security Division of the US Department of Homeland Security.

Vulnerabilities and weaknesses are assigned a CVE or CWE number. E.g.

CVE-2024-28219: In _imagingcms.c in Pillow before 10.3.0, a buffer overflow exists because strcpy is used instead of strncpy.
CWE-1427: The product uses externally provided data to build prompts provided to large language models (LLMs), but the way these prompts are constructed causes the LLM to fail to distinguish between user-supplied inputs and developer-provided system directives

There may be several vulnerabilities related to a weakness. For example, CWE-1393, the use of a default password, has many specific vulnerabilities in individual products related to it.

These warning notices are the backbone of web security and tools like pip-audit will scan your dependencies for any known vulnerabilities from these indexes. Container scanning can find and warn about known vulnerabilities in components of your container images. Tools like dependabot and renovate can automate updating versions (of app dependencies and in your base OS image) as fixes become available.

The OWASP Top Ten

The OWASP Foundation (Open Web Application Security Project) monitors the CVE and CWE indexes and curates a list of the top ten security vulnerabilities for web applications from CVE/CWE data. For a web application to be secure it must, at least, be resistant against these vulnerabilities. The current list of OWASP top ten was compiled in 2021, with an updated list due to be compiled in early 2025.

All of these vulnerabilities and weaknesses are discussed and mitigated to at least some extent in this blog post.

The OWASP Top Ten Vulnerabilities (from OWASP Top Ten | OWASP Foundation)

A01:2021-Broken Access Control moves up from the fifth position; 94% of applications were tested for some form of broken access control. The 34 Common Weakness Enumerations (CWEs) mapped to Broken Access Control had more occurrences in applications than any other category.
A02:2021-Cryptographic Failures shifts up one position to #2, previously known as Sensitive Data Exposure, which was a broad symptom rather than a root cause. The renewed focus here is on failures related to cryptography which often leads to sensitive data exposure or system compromise.
A03:2021-Injection slides down to the third position. 94% of the applications were tested for some form of injection, and the 33 CWEs mapped into this category have the second most occurrences in applications. Cross-site Scripting is now part of this category in this edition.
A04:2021-Insecure Design is a new category for 2021, with a focus on risks related to design flaws. If we genuinely want to “move left” as an industry, it calls for more use of threat modeling, secure design patterns and principles, and reference architectures.
A05:2021-Security Misconfiguration moves up from #6 in the previous edition; 90% of applications were tested for some form of misconfiguration. With more shifts into highly configurable software, it’s not surprising to see this category move up. The former category for XML External Entities (XXE) is now part of this category.
A06:2021-Vulnerable and Outdated Components was previously titled Using Components with Known Vulnerabilities and is #2 in the Top 10 community survey, but also had enough data to make the Top 10 via data analysis. This category moves up from #9 in 2017 and is a known issue that we struggle to test and assess risk. It is the only category not to have any Common Vulnerability and Exposures (CVEs) mapped to the included CWEs, so a default exploit and impact weights of 5.0 are factored into their scores.
A07:2021-Identification and Authentication Failures was previously Broken Authentication and is sliding down from the second position, and now includes CWEs that are more related to identification failures. This category is still an integral part of the Top 10, but the increased availability of standardized frameworks seems to be helping.
A08:2021-Software and Data Integrity Failures is a new category for 2021, focusing on making assumptions related to software updates, critical data, and CI/CD pipelines without verifying integrity. One of the highest weighted impacts from Common Vulnerability and Exposures/Common Vulnerability Scoring System (CVE/CVSS) data mapped to the 10 CWEs in this category. Insecure Deserialization from 2017 is now a part of this larger category.
A09:2021-Security Logging and Monitoring Failures were previously Insufficient Logging & Monitoring and is added from the industry survey (#3), moving up from #10 previously. This category is expanded to include more types of failures, is challenging to test for, and isn’t well represented in the CVE/CVSS data. However, failures in this category can directly impact visibility, incident alerting, and forensics.
A10:2021-Server-Side Request Forgery is added from the Top 10 community survey (#1). The data shows a relatively low incidence rate with above-average testing coverage, along with above-average ratings for Exploit and Impact potential. This category represents the scenario where the security community members are telling us this is important, even though it’s not illustrated in the data at this time.

Example: Injection Attacks

Injection attacks appear as number three in the OWASP Top Ten. This example shows using a query formatter to prevent injection attacks.

Here a user ID is taken from the query and used to form a SQL query fetching user details

user_id = get_query_from_request(request, "UserId")
query = "SELECT * FROM Users WHERE UserId = " + user_id + ";"

We’re expecting the user id to be something like “105”, or any other valid ID. If instead we are supplied with something like: user_id = “105 OR 1=1” then the “OR 1=1” part of the query will now evaluate to true for every user and our query will return information for all users.

Here’s an example of a Salesforce (SOQL) query that finds accounts in specific communities, where the community is specified by user input. The user input (community_list) passes through the format_soql function provided by simplesalesforce, which ensures it can’t be misinterpreted as part of the query and protects against injection attacks

query = format_soql("... AND Community__r.Name IN {}", community_list)
properties = salesforce_client.query_all_iter(query)

query will now be properly formatted so the user input cannot be interpreted as SOQL.

Useful links

A useful primer on cryptography algorithms, principles, and concepts

Crypto 101

The cryptography library from the pyca (Python Cryptography Authority, a working group of experts similar to the PyPA Python Packaging Authority)

Welcome to pyca/cryptography

A guide on XSS prevention with Flask

XSS prevention for Flask | Semgrep

Azure Data Security and Encryption Best Practises

Data security and encryption best practices – Microsoft Azure

National Cyber Security Centre Secure development and deployment guidance

Secure development principles

Best Practices and Challenges of Securing Modern Applications

Full Stack Security Guide

brown binoculars — Photo by mostafa meraji on Unsplash

Password security

Username and password, with two-factor authentication, is the minimum standard for security and it is one of the most security-sensitive parts of a system.

You must not store passwords in plaintext. This is important. You must not store passwords in plaintext. An attacker must not be able to get the passwords of your users by stealing your database. If a website can send you an email with your password in plain text, it is insecure.

To avoid storing passwords in plaintext we used to store a secure hash of the password instead and compare the hash of the user input to the stored password hash. As computation speed increased and storage costs went down it became feasible to generate massive “rainbow tables” of all the hashes of every possible password (up to a certain length) for a given hash function. This breaks the “one-way” nature of hashing algorithms and allows you to go from hash back to original input.

The next step was to salt the hashes – prepending a known “salt” (some random data) to the password before hashing and applying the same salt when checking the password. To break this a rainbow table per salt is needed. As they became more feasible to generate we switched to “key derivation functions” instead of “salt and hash” to protect passwords. Salts are not completely obsolete, they turn up in other cryptographic algorithms as we saw with Ferret encryption.

The state of the art changes over time; the easiest way to deal with this is to use a secure web application framework to handle login management for you, or delegate to an identity provider like Azure AD and use a common protocol like OAuth2 for authentication.

Do not store passwords in plaintext. Use an up-to-date framework to handle logins securely.

Role Based Access Control

Role-Based Access Control (RBAC) provides more fine-grained controls than authenticated/unauthenticated and is simpler to manage than permissions per user. RBAC is a form of Access Control List (ACL). The group information stored in /etc/group on Linux is an example where every group is effectively also a role.

With RBAC, functionality and access to sensitive resources is protected by the requirement for “roles”. “Developer” might be a role, “user” and “admin” are other common roles. Users can be members of groups and typically the roles are given to groups rather than individuals. So to make a user into an admin we add them to the “admin” group. It is easy to give users access to resources by adding them to the required groups, and it is easy to modify the permissions of a whole group by adding or removing roles.

Users are members of groups. Groups have roles that authorize them to access specific resources.

Here are some example roles from an application I’ve been working on. The authorize decorator ensures the endpoint can only be accessed by an authenticated user with [one of] the specified role(s)

@authorize("FULL_UI_AUTHORIZED_ROLES")
async def switch_order_cancellation_request(request: Request, body: JsonDict):
    ...
@authorize("OUTGOING_MESSAGE_AUTHORIZED_ROLES")
async def switch_order_request(request: Request, body: JsonDict):
    ...
@authorize("INCOMING_MESSAGE_AUTHORIZED_ROLES")
async def residential_switch_order_trigger_request(
    request: Request, hub_request: HubResidentialSwitchOrderTriggerRequest, return_envelope: Envelope
):
    ...

Object ownership rather than roles is better for restricting access if possible (single user rather than groups). There are two other principles we try to follow when designing access controls

Follow the principle of least privilege. Every service and request should have the least authority necessary to perform its function.
Deny by default: only give access to needed entities or data, deny access except where specifically allowed.

OAuth2 also includes mechanisms (tokens for applications) for services to authenticate with and access other services. “On Behalf Of” tokens allow a service to access another service with the identity (and permissions) of the end user.

Access tokens are typically provided as JSON Web Tokens (JWTs) and they should be short-lived, or follow OAuth 2 guidelines on revoking access.

Alternatives to RBAC include attribute-based access control (ABAC), and permission-based access control (PBAC).

Method	Pros	Cons
RBAC	Simple and easy to manageCentralised control over user permissionsReduces administrative overhead	Can become complex with many roles. Roles can become outdated or overly broad
ABAC	Centralized management of access policiesHighly adaptable to different contexts and environments	Complex to implement and manage Performance overhead due to policy evaluation
PBAC	Requires sophisticated policy management tools potential performance impact	Requires sophisticated policy management toolsPotential performance impact

gray steel drawing compass near brown leather case

Tooling

“Tools do not make software secure! They help scale the process and help enforce policy.”

– OWASP Web Security Testing Guide

Using the right tools is essential to security and many can help. Static source code analysis, and live vulnerability scanning cannot identify issues due to flaws in design, since they cannot understand the context in which the code is constructed. Many security vulnerabilities can only be discovered with an understanding of the code, so awareness of security issues and code reviews play the most important part in security. Nonetheless, there are important tools that can be used both in development and as part of live systems. We’ve discussed several of them already.

A lot of security policies are integrated into software development processes, such as the Continuous Integration (test/audit before merge) pipelines as part of Source Code Control systems like GitHub and GitLab (along with container registries, package registries, and the build and deploy infrastructure) plus the support they have for processes like code review (perhaps integrated with tools like JIRA for visibility and project management).

These systems must also be secured, often through integration with identity providers and/or with running private installations accessible via a VPN. This secures your codebase and build and deployment infrastructure. There are likely to be many components in this infrastructure in any moderately large system, your Infrastructure As Code and Terraform is often found amongst such systems.

Where they’re fully automated, build and deploy pipelines are often conveniently driven from SCC systems and their CI pipelines (tagging triggering release image and package building and publishing for example). Understanding the security issues and concerns around this area is “DevSecOps“, an aspect of the developer role.

Tools like argocd and Backstage can give you secure views into your deployment infrastructure (logs, terminal access, deploy, and rollback). Platforms like Amazon S3 and Aurora can give you secure, and fast database storage with backups and replication across geographical zones – integrated with the rest of your network and infrastructure (or you have to deploy, secure, and backup your own storage as part of your network/system design).

Tools that can help as part of development

bandit – checks for common security vulnerabilities
ruff, flake8, etc – these can check for some vulnerabilities, like injection attacks. ruff replaces tools like black and flake8 and is working towards replacing bandit too
pip-audit – scans dependencies for CVE vulnerabilities
mypy – static type analysis (code correctness is a security issue)
deptry/grimp – tools that can be used to analyse dependencies within a project to help find unused dependencies

There are also security tools that can be used against or as part of live deployed systems

firewalls (e.g. UFW on Linux, or Web Application Firewalls provided by AWS)
server security plugins (like the OWASP Mod Security Project)
vulnerability scanners (like Greenbone)
container scanning (like Harbor)
intrusion detection (like Snort)
rate limiting and DDOS protection (e.g. The Ingress Nginx Controller)
etc…

For full application level zero trust architecture, OpenZiti with their Python SDK, is a very impressive tool and framework. OpenZiti provides authenticate-before-connect, mTLS and E2E encryption, outbound tunneling, private DNS, etc.

There are also considerations like container building security practices. An example of this is SBOM generation. These are “systems” level concerns rather than developer level, however, these concerns can overlap (developers typically deploy and run on infrastructure provided by a “systems team” and where concerns meet we call DevOps).

For example, distroless release images provide more centralized (consistent and up-to-date) security at the container distro image level and reduces the attack surface area, while changing the paradigm for developers. Keeping your base images secure (up to date) is an important concern.

Live container scanning (or the use of pip-audit) can produce a lot of security alerts on live systems and systems that are being actively developed. Security is an active space.

Automating the updating of dependencies as vulnerabilities are found in deployed versions, whether at the container image (OS) level or application dependency level, reduces the impact of this kind of live security alert. Tools like dependabot and renovate help with this.

The most important advice is to keep your systems updated! Use recent, Long Term Support, versions of your operating system and Python and keep them up to date. Retire and replace obsolete frameworks, libraries and tools before they become security vulnerabilities (or pay expensive consultants to maintain an obsolete version).

This is part of a series: “The absolute minimum every Python web application developer must know about security“. Up next: Security and Cryptography Algorithms

Author

Michael Foord

I’m a Python trainer and contractor. I specialise in teaching Python and the end-to-end automated testing of systems. My passion is for simplicity and clarity in code, efficient lightweight processes and for well designed systems. As a Python core developer I wrote parts of unittest and created the mock library which became unittest.mock.
View all posts

Essential Python web security

CVEs and CWEs

The OWASP Top Ten

Example: Injection Attacks

Useful links

Password security

Role Based Access Control

Tooling

Author

1 Comment

CVEs and CWEs

The OWASP Top Ten

Example: Injection Attacks

Useful links

Password security

Role Based Access Control

Tooling

Author

1 Comment

You might also like

Apache Groovy: Assertions for clearer, more reliable code

Open Source projects vs products: A strategic approach

How Open Source instruction set architectures are transforming security

Why sysadmins should license their code for Open Source

Why I’m proud to be a non-code open source contributor and you should be too