Behind the Scenes of Python Strings: A Beginnerโs Guide
Data Professional with experience in civic data, AI applications, and building data pipelines to solve real-world problems. My work with government datasets has given me unique domain knowledge of how AI can drive operational efficiency and informed decision-making.
Iโm now focused on bringing this expertise to AI-driven companies, with a passion for building products with a purpose turning complex data into solutions that matter.
๐ง madhura.anand@outlook.com
Disclaimer: All opinions and views expressed in any posts/blogs are my own and do not reflect the views or values of my organization.
๐ What is a String ?
A string in Python is an immutable sequence of Unicode characters used to represent text. Itโs defined using single, double, or triple quotes, like "hello" or 'world'.
In Python, assigning a string looks like:
s = "hello"
Here, "hello" is a string literal โ a fixed sequence of characters directly written into the code.
๐ What is a String Literal?
A string literal is the hard-coded text that appears in your Python code โ surrounded by quotes.
โ Examples:
"hello" # string literal
'world' # also a string literal
"This is 1!" # another one
You're telling Python:
โTreat this exact sequence of characters as text.โ
This literal becomes a str object at runtime.
Literal vs dynamic string:
name = "Ru"
name += "zan" # This is not a literal; it's built dynamically
Confirm the type:
print(type("hello")) # <class 'str'>
๐ Unicode by Default in Python 3
In Python 3, all strings are Unicode by default. That means they can represent any language character, emoji, or symbol โ not just English.
This is a huge improvement over Python 2, where strings were just byte arrays.
๐ฐ๏ธ BEFORE: How Strings Were Stored Before Unicode (Python 2 / C-style)
Strings were stored as byte arrays, where:
Each character = 1 byte (8 bits)
Encoding = ASCII (0โ127)
๐ธ Example in C:
char str[] = "hello";
Stored as:
[104, 101, 108, 108, 111, 0] // ASCII codes + null terminator
'h'= 104'e'= 101\0= end of string
โ Limitations:
Only supports English (ASCII)
No native support for characters like
'เคน','ไฝ ', or'๐'Encoding/decoding had to be done manually and was error-prone
โ NOW: How Strings Are Stored With Unicode (Python 3)
In Python 3, strings are Unicode objects, not raw bytes.
Each character maps to a Unicode code point (e.g., 'A' = U+0041, 'เคน' = U+0939, '๐' = U+1F603)
Characters are stored using:
1 byte (for ASCII)
2 bytes (for most non-English characters)
4 bytes (for emojis, rare characters)
Python uses Flexible String Representation (PEP 393) to choose the most efficient layout.
This flexibility is necessary because not all characters need the same amount of memory. Without PEP 393, Python would have to use a fixed-width format like UTF-32 โ allocating 4 bytes per character even for simple ASCII ones like "a", which is inefficient for memory. With PEP 393, Python can dynamically choose the narrowest storage kind based on the content of the string. This saves memory without sacrificing Unicode support, making Python both efficient and globally adaptable.
๐ How Are Unicode Characters Stored?
Example 1 โ Hindi character:
s = "เคน"
print(ord(s)) # 2361
'เคน'= U+0939 โ stored using 2 bytes (UTF-16)
Example 2 โ Emoji:
s = "๐"
print(ord(s)) # 128515
'๐'= U+1F603 โ stored using 4 bytes (UTF-32)
Python automatically chooses the most space-efficient encoding for each string.
๐ Comparison Recap: Before vs Now
| Feature | Before Unicode (ASCII) | Python 3 Unicode (Modern) |
| Type | Raw bytes (str in Py2) | Unicode characters (str) |
| Per character storage | 1 byte | 1โ4 bytes (adaptive) |
| Language support | English only | All global languages + emoji |
| Encoding awareness | Manual (error-prone) | Automatic, managed by Python |
| Example stored string | [104, 101, 108, 108, 111] | Same (for ASCII); varies for Unicode |
How Are Strings Stored in Python?
โ 1. Python Strings Are Objects
Everything in Python is an object โ including strings. When you write s = "hello", Python creates an instance of the built-in str class. Internally, this is a PyUnicodeObject in CPython.
๐ฆ 2. Strings Live on the Heap
Python stores all objects โ including strings โ on the heap, enabling dynamic and long-lived memory management.
When Python sees "hello", it creates the object on the heap and stores the name s in the current namespace (a dictionary mapping variable names to objects).
sis a pointer (reference) on the stackThe actual string object lives on the heap
๐งฌ 3. Whatโs Inside a Python String?
In CPython, strings are implemented as PyUnicodeObject. It holds:
| Field | Purpose |
length | Number of characters |
kind | 1, 2, or 4 bytes per character |
hash | Cached hash |
data | Pointer to actual character data |
interned | Whether it's interned (optimized reuse) |
๐ก Example: "hello"
5 characters
Kind: 1-byte (ASCII)
Data:
[104, 101, 108, 108, 111]
๐ 5. Variable Binding = Namespace Reference
When you do:
s = "hello"
sis a variable in the namespace (a dict of name-object pairs)It holds a reference to the heap-allocated string
"hello" isnโt copied โ s just points to it.
โก 6. Interning: String Reuse for Speed
Python may store only one copy of some strings to save memory and speed up comparisons.
s1 = "hello"
s2 = "hello"
print(s1 is s2) # Often True!
Python automatically interns small string literals and identifiers. You can also manually intern with sys.intern().
๐งน 7. Lifetime & Garbage Collection
Python uses reference counting to manage memory. When no variable refers to a string, it is garbage collected and removed from the heap.
๐ Bonus: What Does Immutability Mean?
Strings in Python are immutable โ once created, they can't be changed. For example:
s = "hello"
s += " world"
This does not modify "hello" โ it creates a new string object with "hello world". The variable s now points to the new object, and the old one is cleaned up if unreferenced.
Immutability ensures:
Safe sharing across functions or threads
Interning works reliably
Hash values stay constant (so strings can be used as dictionary keys)
๐ Python String Quoting Styles: Comparison Table
Single quotes (
'): Used for simple strings without apostrophes or embedded single quotes.Double quotes (
"): Handy when the string contains single quotes (like"It's fine").Triple quotes (
'''or"""): Used for multi-line strings or strings containing both single and double quotes.
| Quote Type | Syntax | Use Case | Multiline Support | Quote Convenience |
| Single Quotes | 'Hello' | Simple strings without single quotes | โ | Must escape ' like It\'s |
| Double Quotes | "Hello" | Strings with apostrophes like It's | โ | Must escape " like He said, \"hi\" |
| Triple Quotes | '''Hello''' or """Hello""" | Multiline strings, docstrings | โ | โ
No need to escape ' or " |
โ Examples:
# Single quotes
name = 'Alice'
# Double quotes
quote = "It's a beautiful day"
# Triple quotes (multiline)
doc = """This is line one.
This is line two."""
# Triple quotes with both quotes inside
dialogue = """She said, "It's amazing!" """
๐ก Understanding these internals helps you write cleaner, more memory-efficient Python code โ and demystifies whatโs happening every time you type "hello"!