Skip to main content

Command Palette

Search for a command to run...

Behind the Scenes of Python Strings: A Beginnerโ€™s Guide

Published
โ€ข6 min read
M

Data Professional with experience in civic data, AI applications, and building data pipelines to solve real-world problems. My work with government datasets has given me unique domain knowledge of how AI can drive operational efficiency and informed decision-making.

Iโ€™m now focused on bringing this expertise to AI-driven companies, with a passion for building products with a purpose turning complex data into solutions that matter.

๐Ÿ“ง madhura.anand@outlook.com

Disclaimer: All opinions and views expressed in any posts/blogs are my own and do not reflect the views or values of my organization.

๐Ÿ” What is a String ?

A string in Python is an immutable sequence of Unicode characters used to represent text. Itโ€™s defined using single, double, or triple quotes, like "hello" or 'world'.

In Python, assigning a string looks like:

s = "hello"

Here, "hello" is a string literal โ€” a fixed sequence of characters directly written into the code.

๐Ÿ” What is a String Literal?

A string literal is the hard-coded text that appears in your Python code โ€” surrounded by quotes.

โœ… Examples:

"hello"        # string literal  
'world'        # also a string literal  
"This is 1!"   # another one

You're telling Python:

โ€œTreat this exact sequence of characters as text.โ€

This literal becomes a str object at runtime.

Literal vs dynamic string:

name = "Ru"
name += "zan"  # This is not a literal; it's built dynamically

Confirm the type:

print(type("hello"))  # <class 'str'>

๐ŸŒ Unicode by Default in Python 3

In Python 3, all strings are Unicode by default. That means they can represent any language character, emoji, or symbol โ€” not just English.

This is a huge improvement over Python 2, where strings were just byte arrays.

๐Ÿ•ฐ๏ธ BEFORE: How Strings Were Stored Before Unicode (Python 2 / C-style)

Strings were stored as byte arrays, where:

  • Each character = 1 byte (8 bits)

  • Encoding = ASCII (0โ€“127)

๐Ÿ”ธ Example in C:

char str[] = "hello";

Stored as:

[104, 101, 108, 108, 111, 0]  // ASCII codes + null terminator
  • 'h' = 104

  • 'e' = 101

  • \0 = end of string

โ— Limitations:

  • Only supports English (ASCII)

  • No native support for characters like 'เคน', 'ไฝ ', or '๐Ÿ˜ƒ'

  • Encoding/decoding had to be done manually and was error-prone


โœ… NOW: How Strings Are Stored With Unicode (Python 3)

In Python 3, strings are Unicode objects, not raw bytes.

Each character maps to a Unicode code point (e.g., 'A' = U+0041, 'เคน' = U+0939, '๐Ÿ˜ƒ' = U+1F603)

Characters are stored using:

  • 1 byte (for ASCII)

  • 2 bytes (for most non-English characters)

  • 4 bytes (for emojis, rare characters)

Python uses Flexible String Representation (PEP 393) to choose the most efficient layout.

This flexibility is necessary because not all characters need the same amount of memory. Without PEP 393, Python would have to use a fixed-width format like UTF-32 โ€” allocating 4 bytes per character even for simple ASCII ones like "a", which is inefficient for memory. With PEP 393, Python can dynamically choose the narrowest storage kind based on the content of the string. This saves memory without sacrificing Unicode support, making Python both efficient and globally adaptable.


๐Ÿ” How Are Unicode Characters Stored?

Example 1 โ€” Hindi character:

s = "เคน"
print(ord(s))  # 2361
  • 'เคน' = U+0939 โ†’ stored using 2 bytes (UTF-16)

Example 2 โ€” Emoji:

s = "๐Ÿ˜ƒ"
print(ord(s))  # 128515
  • '๐Ÿ˜ƒ' = U+1F603 โ†’ stored using 4 bytes (UTF-32)

Python automatically chooses the most space-efficient encoding for each string.


๐Ÿ” Comparison Recap: Before vs Now

FeatureBefore Unicode (ASCII)Python 3 Unicode (Modern)
TypeRaw bytes (str in Py2)Unicode characters (str)
Per character storage1 byte1โ€“4 bytes (adaptive)
Language supportEnglish onlyAll global languages + emoji
Encoding awarenessManual (error-prone)Automatic, managed by Python
Example stored string[104, 101, 108, 108, 111]Same (for ASCII); varies for Unicode

How Are Strings Stored in Python?

โœ… 1. Python Strings Are Objects

Everything in Python is an object โ€” including strings. When you write s = "hello", Python creates an instance of the built-in str class. Internally, this is a PyUnicodeObject in CPython.


๐Ÿ“ฆ 2. Strings Live on the Heap

Python stores all objects โ€” including strings โ€” on the heap, enabling dynamic and long-lived memory management.

When Python sees "hello", it creates the object on the heap and stores the name s in the current namespace (a dictionary mapping variable names to objects).

  • s is a pointer (reference) on the stack

  • The actual string object lives on the heap


๐Ÿงฌ 3. Whatโ€™s Inside a Python String?

In CPython, strings are implemented as PyUnicodeObject. It holds:

FieldPurpose
lengthNumber of characters
kind1, 2, or 4 bytes per character
hashCached hash
dataPointer to actual character data
internedWhether it's interned (optimized reuse)

๐Ÿ”ก Example: "hello"

  • 5 characters

  • Kind: 1-byte (ASCII)

  • Data: [104, 101, 108, 108, 111]


๐Ÿ“Œ 5. Variable Binding = Namespace Reference

When you do:

s = "hello"
  • s is a variable in the namespace (a dict of name-object pairs)

  • It holds a reference to the heap-allocated string

"hello" isnโ€™t copied โ€” s just points to it.


โšก 6. Interning: String Reuse for Speed

Python may store only one copy of some strings to save memory and speed up comparisons.

s1 = "hello"
s2 = "hello"
print(s1 is s2)  # Often True!

Python automatically interns small string literals and identifiers. You can also manually intern with sys.intern().


๐Ÿงน 7. Lifetime & Garbage Collection

Python uses reference counting to manage memory. When no variable refers to a string, it is garbage collected and removed from the heap.


๐Ÿ” Bonus: What Does Immutability Mean?

Strings in Python are immutable โ€” once created, they can't be changed. For example:

s = "hello"
s += " world"

This does not modify "hello" โ€” it creates a new string object with "hello world". The variable s now points to the new object, and the old one is cleaned up if unreferenced.

Immutability ensures:

  • Safe sharing across functions or threads

  • Interning works reliably

  • Hash values stay constant (so strings can be used as dictionary keys)


๐Ÿ” Python String Quoting Styles: Comparison Table

  • Single quotes ('): Used for simple strings without apostrophes or embedded single quotes.

  • Double quotes ("): Handy when the string contains single quotes (like "It's fine").

  • Triple quotes (''' or """): Used for multi-line strings or strings containing both single and double quotes.

Quote TypeSyntaxUse CaseMultiline SupportQuote Convenience
Single Quotes'Hello'Simple strings without single quotesโŒMust escape ' like It\'s
Double Quotes"Hello"Strings with apostrophes like It'sโŒMust escape " like He said, \"hi\"
Triple Quotes'''Hello''' or """Hello"""Multiline strings, docstringsโœ…โœ… No need to escape ' or "

โœ… Examples:

# Single quotes
name = 'Alice'

# Double quotes
quote = "It's a beautiful day"

# Triple quotes (multiline)
doc = """This is line one.
This is line two."""

# Triple quotes with both quotes inside
dialogue = """She said, "It's amazing!" """

๐Ÿ’ก Understanding these internals helps you write cleaner, more memory-efficient Python code โ€” and demystifies whatโ€™s happening every time you type "hello"!

R

This is very Helpful. I learnt something new today

More from this blog

From First Principles: Python Basics

10 posts

Simplifying algorithms & data structures in Python - focusing on the why, not just the code. Clear, beginner-friendly explanations that actually make sense.