Chapter 0: Basics of Python

About This Chapter

Business analytics sits at the intersection of data, computation, and decision-making. Before you can work with datasets, build models, or visualise trends, you need a reliable language to talk to a computer — and that language is Python.

Why Python?

Analysts today have several tools at their disposal: Excel is ubiquitous but breaks at scale and offers no reproducibility; R is superb for statistics but carries a steeper syntax curve; Stata and SAS are powerful for econometrics but are expensive, proprietary, and slow to iterate. Python threads all these needles. It is free, open-source, readable almost like English, and backed by one of the largest developer communities in the world. When JPMorgan wants to automate risk reports, when Netflix wants to segment subscribers, when HSBC wants to flag fraudulent transactions — Python is the language they reach for.

A Brief History

Python was created by Guido van Rossum, a Dutch computer scientist, and first released in 1991. Van Rossum named it after the BBC comedy series Monty Python’s Flying Circus — a deliberate signal that programming should be fun and accessible, not arcane. The language was designed with a single guiding philosophy: There should be one obvious way to do it. This philosophy, codified in PEP 20 (“The Zen of Python”), makes Python code readable by people who did not write it — a critical property in team-based analytics.

Python 2 was the dominant version until about 2010. Python 3, released in 2008, broke backward compatibility to fix fundamental design mistakes (integer division, Unicode handling, and more). Python 2 reached end-of-life in January 2020. All modern analytics work — and this course — uses Python 3.

Python in the Data Science Ecosystem

Python’s strength in analytics is not the language itself but the libraries built on top of it:

Library	Purpose
NumPy	Fast array arithmetic; the foundation of almost everything else
pandas	Tabular data — think Excel with programming
matplotlib / seaborn	Publication-quality visualisations
scikit-learn	Machine learning in 5 lines of code
statsmodels	Econometrics and statistical testing
scipy	Scientific computing, distributions, optimisation

Together these form the PyData stack, used by researchers and practitioners worldwide. When you import pandas, you are accessing 15 years of engineering effort for free.

What You Will Learn in This Chapter

This chapter covers the seven foundational ideas every Python user must know cold:

Variables and arithmetic — storing and computing with numbers
Strings — representing and manipulating text
Boolean logic — asking yes/no questions about data
Conditional branching (if/elif/else) — making decisions in code
For loops — repeating an action over many items
Functions — packaging logic for reuse

Each section pairs a conceptual explanation with a live, runnable code cell. Modify the code, press Shift+Enter, and see the results immediately — no installation required.

Prerequisites

None. You do not need any prior programming experience. If you have used Excel formulas, you already understand the core idea: you give the computer a rule, it applies it to data. Python is that idea, generalised to unlimited complexity.

A First Look at Python

Python has become the dominant language for business analytics, and the rest of this book assumes you can read and write it fluently at the level introduced here. Everything in this chapter runs live in your browser through an embedded Python engine, so you can read a passage, scroll to the accompanying code cell, edit it, and observe the result without installing anything. The chapter is designed to be read linearly, but the code cells are independent and can be revisited in any order.

How to use the live code cells in this book

Edit any code cell, then click the Run button to execute it.
Press Shift+Enter while the cursor is inside a cell to run it.
All output appears immediately below the code, exactly as it would in a Jupyter notebook.

Running Your First Line of Python

The print() function is the simplest way to ask Python to show you something. It sounds trivial, but the ability to inspect values at any point in your code is the single most important debugging skill you will develop. Professional analysts use print() (or its notebook equivalent) constantly to sanity-check intermediate results, to confirm that data loaded correctly, and to trace logic errors.

Notice the quotation marks around "Hello, Business Analytics!". In Python, anything inside quotes is a string — a sequence of characters treated as text, not as a number or instruction. We will cover strings in depth shortly.

Interpretation: The output Hello, Business Analytics! confirms that Python is running, the Pyodide engine has loaded, and your browser can execute Python code. If you see an error instead, something went wrong with the page load — refresh and try again.

Edit the line above and run again.

Print is your best friend

Get in the habit of printing intermediate results as you build any analysis. In production code you might remove them, but during development, print() is how you verify that each step does what you expect.

In practice

When a bank loads a 10-million-row transaction file, the first thing an analyst does is print(df.shape) and print(df.head()) — the Python equivalent of checking that the file opened correctly before running any analysis.

Variables and Arithmetic

Background

A variable is a named container for a value. When you write revenue = 1_250_000, you are telling Python to reserve a location in the computer’s memory, store the integer 1,250,000 there, and label that location revenue so you can refer to it later.

Python distinguishes between several numeric types. The two you will encounter most often are:

int (integer): whole numbers — 1, 42, -7, 1_000_000. Python underscores in integer literals (like 1_000_000) are purely cosmetic — they make large numbers readable, like a comma separator, but have no computational effect.
float (floating-point number): numbers with a decimal component — 1.5, 0.075, -3.14. Floats follow the IEEE 754 standard, which means they are stored as binary fractions. This is why 0.1 + 0.2 in Python gives 0.30000000000000004 rather than exactly 0.3. For most business calculations the error is negligible, but it matters in financial reconciliation — a topic Chapter 3 addresses.

In business analytics, variables naturally map to financial line items: revenue, cost, headcount, interest rate. Giving them meaningful names (gross_margin rather than gm) is not just style — it is documentation.

Before revealing the output, work the four lines on paper: what profit will Python print, and what percentage will the margin format to?

Interpretation: The output shows a profit of $370,000 and a gross margin of 29.6%. In retail, a margin near 30% is respectable: supermarkets typically run 20–30% while software companies exceed 70%. For a manufacturing firm, 29.6% would be considered healthy. Context determines whether a margin figure is a red flag or a green light. The f"..." syntax, known as an f-string and introduced in Python 3.6 through PEP 498, lets you embed variable values directly in text, with :, adding thousands separators and :.1% formatting a decimal as a percentage rounded to one decimal place.

Common pitfall: integer division

In Python 2, 370000 / 1250000 returned 0 because dividing two integers gave an integer result (truncated). In Python 3, the same expression returns 0.296 as expected. Always use Python 3. If you ever need integer (floor) division explicitly, use the // operator: 7 // 2 gives 3.

In practice

At JPMorgan’s equity research desk, analysts write Python scripts that pull quarterly financials for hundreds of companies and compute margin ratios automatically — exactly this calculation, scaled to thousands of tickers. The script that runs in 2 seconds in Python would require hours of manual spreadsheet work.

Your Money, Compounded

If you put HK$10,000 of your summer-internship savings into a tracker fund returning 7% a year, how much will it be worth on your graduation day? The same arithmetic that drives revenue forecasts drives personal wealth: every dollar of return earns its own return next year. Below, the closed-form expression principal * (1 + rate) ** years is checked against a year-by-year table so you can see compounding accumulate one row at a time.

Interpretation: Four years at 7% turns HK$10,000 into HK$13,107.96 — a 31% gain without lifting a finger. The closed-form line and the loop agree to the cent, which is the point: the formula is the loop, rolled up. Notice the :,.2f spec produces the familiar thousand-separated, two-decimal HK-dollar format. Two assumptions deserve scrutiny: the 7% is an expected return, not a guarantee, and inflation will eat into the real purchasing power of the final amount. Chapter 3 will show how to express both uncertainty and inflation adjustment explicitly.

Rounding — `round(value, ndigits)`

Most business numbers don’t need 17 decimal places. round(value, ndigits) returns value rounded to ndigits digits after the decimal point. If you omit ndigits you get a whole number. Negative ndigits rounds to the left of the decimal — useful for “round to the nearest thousand.”

Two ways to control digits

round(x, n) — actually changes the value of x (returns a new rounded number).
f"{x:.2f}" — formats x for display only; the underlying variable is unchanged.

For final reports use f"..." (display rounding). Use round() when the rounded number itself will be reused — e.g. a price that must be quoted in cents, or a share count that must be a whole number.

Banker’s rounding (the .5 surprise)

Python uses banker’s rounding: 0.5 rounds toward the nearest even number, not always up.

This is the IEEE 754 default. Over thousands of rounded numbers it removes the upward bias that “always round half up” introduces — which is why it is the standard in banking and statistics. If you genuinely need “always round half up” (e.g. for a tax calculation that has to match a specific regulator’s rule), use math.ceil(x - 0.5) or decimal.Decimal with an explicit rounding mode.

Strings: Working with Text

Background

In computing, a string is a sequence of characters. Python stores strings as Unicode (specifically, UTF-8 compatible), which means a single string can contain English letters, Chinese characters (中文), Arabic script, mathematical symbols (∑, π), and emoji — all in the same variable. This is critical for global business: a Hong Kong company’s CRM might contain customer names in both Traditional Chinese and English, and Python handles both without special configuration.

String manipulation is one of the most common tasks in real-world analytics, and raw business data is almost never clean enough to be analysed directly. Company names arrive with inconsistent capitalisation, so the same firm appears as “apple inc”, “Apple Inc.”, and “APPLE INC” across three rows of the same file. Dates are stored as free text in mutually incompatible formats — “2024-01-15”, “Jan 15 2024”, and “15/01/24” all denote the same day. CSV files routinely embed stray spaces, tab characters, or line breaks inside fields, none of which are visible until they break a downstream join. Before any textual variable can be analysed, it must be cleaned, and Python’s built-in string methods — .strip(), .lower(), .upper(), .split(), .replace(), .startswith() — are the workhorses of that cleaning process.

f-strings (PEP 498, Python 3.6, 2017) are the modern way to embed variables inside text. Before f-strings, analysts used % formatting or .format() — both clunkier. An f-string prefixes the opening quote with f and uses {variable} placeholders, optionally followed by a format spec (:,.2f for comma-separated, 2-decimal-place float).

Interpretation: The first line outputs AAPL closed at $189.50 — a properly formatted market data label. .upper() ensures the ticker is always in canonical uppercase regardless of how it was stored. The .lower() result (hkust) shows how to normalise case for comparison or matching. .strip() removes the two leading and trailing spaces — invisible but fatal if you try to match " hello " against "hello". .split(",") converts a comma-separated string into a Python list of three elements: this is exactly how you parse a CSV row or a list of tickers received from an API.

Common pitfall: == vs is for strings

Use == to compare string values, and never use is. The expression "aapl" == "aapl" is always True. The expression "aapl" is "aapl" may or may not be True depending on Python’s internal string interning — an implementation detail you should never rely on. Reserve is for checking whether something is None.

In practice

Before importing customer records into a CRM or database, data engineers run a cleaning pipeline in Python. Typical steps: .strip() to remove whitespace, .title() to standardise capitalisation, .replace() to fix known typos, and a regex to validate email formats. This process — called data wrangling — consumes an estimated 60–80% of a data analyst’s time on real projects.

Boolean Logic

Background

Boolean logic is named after George Boole (1815–1864), an English mathematician who showed that logical reasoning could be expressed algebraically using only two values: True and False. His 1854 work An Investigation of the Laws of Thought laid the theoretical foundation for digital computing — every bit in your computer is a Boolean.

In Python, bool is a type with exactly two possible values: True and False. Boolean expressions are produced by comparison operators:

Operator	Meaning	Example
`>`	greater than	`price > 50`
`<`	less than	`volume < 100`
`>=`	greater than or equal	`margin >= 0.20`
`<=`	less than or equal	`debt_ratio <= 0.5`
`==`	equal	`sector == "Tech"`
`!=`	not equal	`rating != "Junk"`

Multiple Boolean expressions combine with and, or, and not. These map directly to SQL WHERE clauses — WHERE price > 50 AND volume > 1000000 is identical in meaning to price > 50 and volume > 1_000_000. If you have written database queries before, you already understand Boolean logic; Python just uses different syntax.

Interpretation: Both conditions are True — the price (100) exceeds 50, and the volume (5,000,000) exceeds 1,000,000. Consequently, both and (which requires all conditions to be True) and or (which requires at least one) return True. In a stock screening context, combining conditions with and narrows the candidate set (high-price and high-volume stocks), while or broadens it. Most quant screens apply multiple and filters to isolate a small, focused list of candidates.

Common pitfall: = vs ==

A single equals sign = is assignment — it stores a value. A double equals sign == is comparison — it tests equality and returns a Boolean. Writing if price = 100: is a syntax error. This is one of the most common mistakes for beginners coming from Excel or Stata where = does double duty.

In practice

Credit approval systems at banks evaluate hundreds of Boolean conditions in real time: credit_score >= 680 and debt_to_income <= 0.43 and employment_years >= 2. A loan application passes or fails based on the combined Boolean result. This logic — once coded manually by a team of analysts — runs in Python (or a Python-based rules engine) and processes millions of applications per day.

If / Elif / Else

Background

Conditional branching is how a program makes decisions. The conceptual ancestor is the decision tree — a flowchart that routes inputs to different outcomes depending on logical tests. Python’s if/elif/else structure is that flowchart expressed in code.

The structure reads naturally: If condition A is true, do X. Else if condition B is true, do Y. Otherwise, do Z. Python uses indentation (four spaces by convention) to delimit which code belongs to each branch — there are no curly braces {}. This is one of Python’s most distinctive design choices and makes code visually clean.

A few practical guidelines:

Order matters. Python evaluates conditions top to bottom and stops at the first True match. Put your most specific (narrowest) conditions first.
elif vs nested if. Use elif when the conditions are mutually exclusive (only one branch should fire). Use separate if statements when multiple conditions can independently apply.
Dictionary lookup as an alternative. For simple label mappings with many cases, a dictionary can be cleaner and faster than a long elif chain. But for range-based conditions (like the return thresholds below), if/elif is the right tool.

With daily_return = 0.025, walk down the conditions in order — which branch is the first to fire, and what signal will print?

Interpretation: A daily return of 2.5% exceeds the first threshold (2%), so the signal is STRONG BUY. In practice, a single-day gain of this magnitude is a significant move for a large-cap stock, as the S&P 500 averages roughly 0.05% per day. A rule of this kind must be calibrated to the asset class and the prevailing volatility regime: the same 2.5% move is unremarkable for a cryptocurrency but extraordinary for a 10-year Treasury bond.

As a small exercise, change daily_return to -0.03 and re-run the cell. The else branch now fires and the signal flips to SELL, demonstrating how a single threshold change can flip a trading recommendation.

Common pitfall: forgetting indentation

Python is one of the few languages where indentation is syntactically required, not just a style choice. If you write the body of an if block without indenting it, Python raises an IndentationError. The standard is 4 spaces per level. Never mix tabs and spaces — it causes hard-to-debug errors.

In practice

Risk rating systems at insurance companies and rating agencies (Moody’s, S&P) apply exactly this branching logic: if a company’s interest coverage ratio exceeds 8, assign AAA; elif above 4, assign AA; and so on down to distressed ratings. The logic is simple; the challenge is computing the input ratios correctly from financial statements — which is where Python’s data-handling libraries come in.

For Loops

Background

A for loop repeats a block of code once for each item in a sequence. This is the fundamental mechanism of automation: instead of copying and pasting the same calculation 1,000 times, you write it once and let the loop handle the repetition.

From a computer-science perspective, a loop over a list of n items has O(n) time complexity — the work grows linearly with the number of items. If processing one item takes 1 millisecond, processing 1,000 takes ~1 second, and 1,000,000 takes ~17 minutes. For very large datasets you would replace Python loops with vectorised NumPy operations (which run at C speed) or pandas methods — but understanding loops is the prerequisite to understanding why vectorisation matters.

Python’s zip() function pairs up two sequences element-by-element, letting you iterate over multiple lists simultaneously without manual index management. It is one of Python’s most elegant built-in utilities.

Key loop patterns you will use repeatedly:

Pattern	Purpose
`for item in list:`	Iterate over items
`for i, item in enumerate(list):`	Iterate with an index counter
`for a, b in zip(list1, list2):`	Iterate over two lists in parallel
`[expr for item in list]`	List comprehension (Chapter 1)

Interpretation: The loop prints a price table for four technology stocks — a portfolio snapshot. The total portfolio value (assuming 1 share of each) is $1,654. Notice that NVDA at $875 is the most expensive and dominates a fixed-share portfolio. A $1 investment in each would be different from 1 share in each — this is the difference between equal-weight and price-weight portfolio construction, a topic that recurs in Chapter 3. The :6s format spec in the f-string pads ticker to 6 characters, aligning the columns.

Common pitfall: modifying a list while looping over it

Never add or remove items from a list while you are iterating over it with a for loop. Python’s behaviour in this case is undefined and leads to silent bugs where items are skipped or processed twice. If you need to filter a list, build a new one using a list comprehension or filter(), then reassign.

In practice

A quantitative analyst processing end-of-day data for 500 S&P 500 components writes a loop that fetches each company’s financial ratios, applies a scoring model, and appends the result to a summary list. This nightly batch job — a loop over 500 tickers — might run in under a minute using pandas. Without Python, the same task would require a team of analysts working for hours.

Functions

Background

A function is a named, reusable block of code. The core principle is DRY — Don’t Repeat Yourself. If you find yourself writing the same calculation in three places, you should write it once as a function and call it three times.

Functions embody the modularity principle in software design: break a complex problem into small, well-named, independently testable pieces. Bertrand Meyer, the computer scientist who formulated Design by Contract, argued for command-query separation: a function should either do something (a command, with side effects) or answer something (a query, returning a value) — not both. In analytics, most functions are queries: they take data in and return a result.

The anatomy of a Python function:

def function_name(parameter1, parameter2):
    """Docstring: explain what the function does."""
    # body: the calculation
    return result

A few rules govern this structure. The def keyword declares the function, and the indented block beneath it constitutes the function’s body. Parameters are local variables: they exist only inside the function, and any assignment to them does not leak back to the caller’s scope. The docstring, written as a triple-quoted string immediately after def, is machine-readable documentation that help(function_name) displays at the interpreter prompt. The return statement sends a value back to the caller; if a function has no explicit return, Python silently returns None, which is a frequent source of surprise for newcomers.

Interpretation: The function computes the net proceeds from selling (liquidating) a position after a 0.1% brokerage commission is deducted from the gross sale value. Selling 100 shares of AAPL at $189.50 yields $18,950 gross; after the $18.95 fee, the trader receives $18,931.05. Selling 50 shares of NVDA at $875 yields $43,750 gross; net $43,706.25. The NVDA position is roughly 2.3× larger — relevant for thinking about portfolio concentration. The 0.1% fee is realistic for retail brokers; institutional traders often pay 0.01–0.03%. To compute the cost of buying the same position, you would add the fee instead of subtracting it (return gross + fee) — the buyer pays the broker’s commission on top of the share value.

Common pitfall: mutable default arguments

Never use a mutable object (list, dictionary) as a default argument value. The expression def f(data=[]): creates one list object at function definition time, shared across all calls that use the default. Instead, use def f(data=None): if data is None: data = []. This is one of the most notorious Python gotchas and affects even experienced programmers.

In practice

A reusable position_value() function is the building block of a portfolio management system. Write it once, test it thoroughly, and call it from your backtesting engine, your risk dashboard, and your order execution module. If the broker changes its fee structure from 0.1% to 0.05%, you update one line — not dozens of spreadsheet cells scattered across multiple files.

Working with an AI Copilot

Every analyst on a trading floor, in a consulting team, or at a tech firm now works alongside an AI assistant — ChatGPT, Claude, GitHub Copilot, or a model embedded in their IDE. The bar for entry-level analysts has shifted accordingly: writing code from a blank page is no longer the differentiator. The skill that separates a good analyst from a bad one is knowing how to verify the AI’s code. An AI will happily produce a plausible-looking function that silently swaps two arguments, divides by the wrong column, or invents a pandas method that does not exist. The analyst who reads, runs, and stress-tests the output catches these errors before they reach a deck. The analyst who pastes and ships does not.

A concrete example. Suppose your code raises TypeError: unsupported operand type(s) for +: 'int' and 'str'. Compare two ways of asking for help:

Bad prompt: “fix this code”. The AI has no idea what you expected, so it guesses, and its guess often does not match your data.
Good prompt: “Here is my code: … Here is the error: TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’. I expected the function to return the total revenue as an integer. What is wrong and how do I fix it?”. The AI now has the code, the symptom, and the goal — and can pinpoint the missing int() cast.

Three rules that will save you many hours over this course:

Always run AI code yourself before pasting it into an assignment or a report. If it crashes on your machine, it would have crashed on the grader’s machine too.
Ask the AI to explain why the code works, not just to give code. If you cannot rephrase the explanation in your own words, you do not understand it well enough to defend it in an interview.
Never trust an AI to do statistics it cannot show its working for. If the answer is a single number with no formula, no intermediate quantities, and no sanity check, treat it as a hypothesis to verify — not a conclusion.

Exercises

This exercise pulls together everything you have read so far. The firm posts revenue of $1,250,000 against cost of $880,000, producing a 29.6% margin. Your task is to reduce cost enough to push profit above $420,000, which corresponds to a margin above one-third of revenue.

The framing may look like a toy problem, but the underlying question — how much cost reduction is needed to hit a target margin? — is the standard exercise every CFO works through during budget season. At industrial scale the only difference is that revenue and cost are aggregated from thousands of rows of transactional data rather than entered as scalar literals.

Modify the value of cost below so that profit exceeds $420,000, then run the cell.

After editing, click the Run button to evaluate the cell.

Hint

You need profit = revenue - cost > 420,000, which implies cost < 830,000. Try cost = 820_000 and confirm.

Key insight

The last line uses a ternary expression: "Goal met." if profit > 420_000 else "Not yet. Try a lower cost.". This is Python’s one-line if/else, a compact way to choose between two values based on a condition. It is equivalent to a three-line if/else block and is widely used in data pipelines to attach labels and flags to rows of a DataFrame.

Decision Memo — Your First Analyst Output

An analyst’s job does not end with a chart, a notebook, or a printed number. It ends with a one-page memo that tells a decision-maker what to do and why. Charts support the memo; they do not replace it. Throughout this course you will package every analysis you write into the same structured template, so that by the time you sit in an internship the format is automatic. The discipline is straightforward: state the recommendation, marshal two or three numbers as evidence, name the assumptions that could overturn the recommendation, and propose the next test.

**To:**       <decision-maker>
**From:**     <your name>
**Subject:**  <one-line claim>
**Date:**     <date>

**Recommendation:**
One sentence saying *what should be done*.

**Evidence:**
2–3 bullet points with numbers — derived from the analysis.

**Caveats:**
1–2 bullet points naming the most important assumption(s) that, if wrong, would flip the recommendation.

**Next step:**
One sentence — what additional data or test would tighten the answer.

A worked example for the compound-interest exercise:

To: Future Me

From: A Year-2 ISOM 2600 student

Subject: Park summer-internship savings in a low-cost tracker fund until graduation

Date: 2026-05-16

Recommendation: Deposit the HK$10,000 of summer savings into a broad-market tracker fund and leave it untouched for the four years until graduation.

Evidence:

At a 7% annual return, the deposit compounds to HK$13,107.96 by graduation day — a 31% gain over the principal.

The year-by-year balance grows monotonically (HK$10,700, HK$11,449, HK$12,250, HK$13,108), so no rebalancing trigger arises during the holding period.

A four-year horizon comfortably exceeds the recommended minimum holding period for equity tracker funds, reducing the probability of withdrawing in a drawdown.

Caveats:

The calculation assumes a constant 7% annual return; realised equity returns are volatile and can be sharply negative in any given year.

Inflation is ignored; if HK CPI averages 2%, the real terminal value is closer to HK$12,109.

Next step: Re-run the calculation with a distribution of historical annual returns (e.g. Hang Seng or S&P 500, 1990–present) to report a 10th–90th percentile range alongside the point estimate.

Mistakes Library: The Mars Climate Orbiter (1999)

On 23 September 1999 NASA lost the Mars Climate Orbiter, a US$327 million probe, as it entered the Martian atmosphere at the wrong altitude and burned up. The cause was not a software crash but a units mismatch. Lockheed Martin’s ground-software team supplied thrust impulse in pound-force seconds; NASA’s navigation team’s software expected newton seconds. Both numbers looked sensible in isolation; neither team’s variable name carried the unit.

Python does not solve this problem for you. A variable named revenue carries no information about whether the figure is in HKD, USD, or RMB; pass revenue_hkd into a function that expects revenue_usd and Python will multiply happily. Two rules: put the unit in the variable name (distance_km, not distance; price_hkd, not price), and review every variable’s units when reading code someone else wrote.

Chapter Summary

This chapter introduced the seven foundational elements of Python that every business analytics practitioner uses daily. Here is a consolidated reference:

Key Takeaways

Variables and arithmetic: Python stores values in named variables. Use int for whole numbers and float for decimals. The f-string syntax (f"...") formats output readably. Watch out for floating-point precision in financial calculations.
Strings: Python 3 strings are Unicode by default, handling any human language. Core cleaning methods — .strip(), .lower(), .upper(), .split(), .replace() — are the foundation of data wrangling. F-strings (PEP 498) are the modern formatting standard.
Boolean logic: Named after George Boole (1854). Comparison operators (>, <, ==, !=, etc.) produce True/False. Combine with and, or, not. Directly equivalent to SQL WHERE clauses.
Conditional branching: if/elif/else routes program execution based on conditions. Conditions are evaluated top-to-bottom; only the first matching branch fires. Indentation is mandatory and meaningful in Python.
For loops: Repeat code over sequences. zip() iterates two lists in parallel. Time complexity is O(n) — for large datasets, switch to vectorised pandas/NumPy operations (Chapter 2).
Functions: Package logic for reuse following the DRY principle. Always write a docstring. Never use mutable objects as default arguments.

Operators Quick Reference

Category	Operators
Arithmetic	`+`, `-`, ``, `/`, `//` (floor div), `%` (modulo), `*` (power)
Comparison	`==`, `!=`, `<`, `>`, `<=`, `>=`
Logical	`and`, `or`, `not`
Assignment	`=`, `+=`, `-=`, `*=`, `/=`

What’s Next

The remainder of the book treats this chapter’s vocabulary as common ground and builds analytical machinery on top of it. The next chapter shifts from single values to collections of values, which is where Python begins to look less like a calculator and more like a data platform. Before turning the page, take a moment to confirm that the four headings in the box below feel like familiar territory; if any of them still feels uncertain, revisit the corresponding section and re-run its code cell with your own numbers.

You now know

Variables, arithmetic, and f-strings.
Booleans and if/elif/else branching.
For loops and parallel iteration with zip().
Functions with docstrings and the DRY principle.

Chapter 1 introduces lists and list comprehensions, the pandas Series, NumPy arrays, SciPy statistical distributions, and your first plots with matplotlib. Together these tools scale every concept you have just met from single values to entire datasets, and they form the working vocabulary for the empirical work in the chapters that follow.

Debug Yourself: Counting Vowels

The cell below intends to count the vowels in the phrase "Business Analytics". It runs without raising an error, but the answer it produces is off by one. A hidden test at the end will tell you when you have fixed it. Try to spot the bug by reading the loop body carefully before you peek at the explanation underneath.

The trick: the membership test ch in vowels is case-sensitive, and vowels = "aeiou" contains only lowercase letters. The phrase "Business Analytics" has six vowels — u, i, e, A, a, i — but the uppercase A is invisible to the test, so the buggy version returns 5. The minimal fix is to compare against the lowercased character, e.g. if ch.lower() in vowels:, or equivalently to widen the alphabet to vowels = "aeiouAEIOU". Both produce the correct count of 6, and the hidden assertion will pass.

Interactive Explorer: Compound Interest

The two values at the top of the cell control how a $10,000 deposit grows over time. Change the annual rate, change the horizon in years, then press Run and watch the curve and the terminal value update. Try a 5 % rate over 10 years for a baseline, then double the horizon to 20 years to see how compounding rewards patience, and finally raise the rate to 10 % to see how a higher return reshapes the curve into a sharper exponential.

Review Cards — Chapter 0

These cards are scheduled for spaced repetition. Click Show answer, rate how easily you recalled it, and the book will surface the harder cards again on a longer interval — the same Anki technique used by language learners and medical students. Your review history is stored privately in your browser; nothing is uploaded.

It evaluates to 0.30000000000000004, not exactly 0.3. Floats follow the IEEE 754 binary standard, in which 0.1 and 0.2 cannot be represented exactly — so their sum carries a tiny rounding error. For financial reconciliation use round(x, 2) for display or the decimal.Decimal type for exact arithmetic.

round(0.5) returns 0, not 1. Python uses banker’s rounding (round-half-to-even, the IEEE 754 default) to avoid the upward bias that always-round-half-up introduces over large sums. Similarly round(2.5) returns 2, while round(1.5) and round(3.5) round up to even values 2 and 4.

= is assignment — it stores a value into a name, as in price = 100. == is comparison — it tests whether two values are equal and returns a Boolean, as in price == 100. Writing if price = 100: is a syntax error; the conditional must use ==.

It returns None. Any Python function without an explicit return statement silently returns None. This is a frequent source of surprise for newcomers — for example, result = my_list.sort() makes result equal to None because .sort() mutates the list in place.

The five integers 0, 1, 2, 3, 4 — range(5) is half-open: it starts at 0 and stops before 5. To get 1, 2, 3, 4, 5 instead, write range(1, 6). This zero-based, end-exclusive convention matches list indexing, where my_list[0] is the first element and my_list[len(my_list)] is out of range.

'5' is a string (a sequence of characters) and 2 is an int; Python refuses to combine them because + means concatenation for strings and addition for numbers. Convert explicitly: int('5') + 2 gives 7, or '5' + str(2) gives '52'. The fix depends on whether you want arithmetic or text concatenation.

It returns [1], then [1, 1], then [1, 1, 1] — the default list is created once at function definition time and shared across every call that uses the default. This is the notorious mutable default argument trap. The fix is def f(x=None): if x is None: x = []; x.append(1); return x.

Next →Python Essentials

About This Chapter

Why Python?

A Brief History

Python in the Data Science Ecosystem

What You Will Learn in This Chapter

Prerequisites

A First Look at Python

Running Your First Line of Python

Variables and Arithmetic

Background

Your Money, Compounded

Rounding — round(value, ndigits)

Strings: Working with Text

Background

Boolean Logic

Background

If / Elif / Else

Background

For Loops

Background

Functions

Background

Working with an AI Copilot

Exercises

Decision Memo — Your First Analyst Output

Chapter Summary

Key Takeaways

Operators Quick Reference

What’s Next

Suggested Further Reading

Debug Yourself: Counting Vowels

Interactive Explorer: Compound Interest

Review Cards — Chapter 0

Rounding — `round(value, ndigits)`