• 📖 Cover
  • Contents

Chapter 0: Basics of Python

About This Chapter

Business analytics sits at the intersection of data, computation, and decision-making. Before you can work with datasets, build models, or visualise trends, you need a reliable language to talk to a computer — and that language is Python.

Why Python?

Analysts today have several tools at their disposal: Excel is ubiquitous but breaks at scale and offers no reproducibility; R is superb for statistics but carries a steeper syntax curve; Stata and SAS are powerful for econometrics but are expensive, proprietary, and slow to iterate. Python threads all these needles. It is free, open-source, readable almost like English, and backed by one of the largest developer communities in the world. When JPMorgan wants to automate risk reports, when Netflix wants to segment subscribers, when HSBC wants to flag fraudulent transactions — Python is the language they reach for.

A Brief History

Python was created by Guido van Rossum, a Dutch computer scientist, and first released in 1991. Van Rossum named it after the BBC comedy series Monty Python’s Flying Circus — a deliberate signal that programming should be fun and accessible, not arcane. The language was designed with a single guiding philosophy: There should be one obvious way to do it. This philosophy, codified in PEP 20 (“The Zen of Python”), makes Python code readable by people who did not write it — a critical property in team-based analytics.

Python 2 was the dominant version until about 2010. Python 3, released in 2008, broke backward compatibility to fix fundamental design mistakes (integer division, Unicode handling, and more). Python 2 reached end-of-life in January 2020. All modern analytics work — and this course — uses Python 3.

Python in the Data Science Ecosystem

Python’s strength in analytics is not the language itself but the libraries built on top of it:

Library Purpose
NumPy Fast array arithmetic; the foundation of almost everything else
pandas Tabular data — think Excel with programming
matplotlib / seaborn Publication-quality visualisations
scikit-learn Machine learning in 5 lines of code
statsmodels Econometrics and statistical testing
scipy Scientific computing, distributions, optimisation

Together these form the PyData stack, used by researchers and practitioners worldwide. When you import pandas, you are accessing 15 years of engineering effort for free.

What You Will Learn in This Chapter

This chapter covers the seven foundational ideas every Python user must know cold:

  1. Variables and arithmetic — storing and computing with numbers
  2. Strings — representing and manipulating text
  3. Boolean logic — asking yes/no questions about data
  4. Conditional branching (if/elif/else) — making decisions in code
  5. For loops — repeating an action over many items
  6. Functions — packaging logic for reuse

Each section pairs a conceptual explanation with a live, runnable code cell. Modify the code, press Shift+Enter, and see the results immediately — no installation required.

Prerequisites

None. You do not need any prior programming experience. If you have used Excel formulas, you already understand the core idea: you give the computer a rule, it applies it to data. Python is that idea, generalised to unlimited complexity.


Welcome to Python

Python is the #1 language for business analytics. Everything on this chapter runs live in your browser — no install, no setup.

How to use these slides
  • Edit any code cell, then click the ▶ Run button to execute
  • Press Shift+Enter while inside a cell to run it
  • All output appears below the code, just like Jupyter

Your First Line of Python

The print() function is the simplest way to ask Python to show you something. It sounds trivial, but the ability to inspect values at any point in your code is the single most important debugging skill you will develop. Professional analysts use print() (or its notebook equivalent) constantly — to sanity-check intermediate results, to confirm that data loaded correctly, and to trace logic errors.

Notice the quotation marks around "Hello, Business Analytics!". In Python, anything inside quotes is a string — a sequence of characters treated as text, not as a number or instruction. We will cover strings in depth shortly.

Interpretation: The output Hello, Business Analytics! confirms that Python is running, the Pyodide engine has loaded, and your browser can execute Python code. If you see an error instead, something went wrong with the page load — refresh and try again.

Edit the line above and run again.

Print is your best friend

Get in the habit of printing intermediate results as you build any analysis. In production code you might remove them, but during development, print() is how you verify that each step does what you expect.

In practice

When a bank loads a 10-million-row transaction file, the first thing an analyst does is print(df.shape) and print(df.head()) — the Python equivalent of checking that the file opened correctly before running any analysis.

Variables & Arithmetic

Background

A variable is a named container for a value. When you write revenue = 1_000_000, you are telling Python to reserve a location in the computer’s memory, store the integer 1,000,000 there, and label that location revenue so you can refer to it later.

Python distinguishes between several numeric types. The two you will encounter most often are:

  • int (integer): whole numbers — 1, 42, -7, 1_000_000. Python underscores in integer literals (like 1_000_000) are purely cosmetic — they make large numbers readable, like a comma separator, but have no computational effect.
  • float (floating-point number): numbers with a decimal component — 1.5, 0.075, -3.14. Floats follow the IEEE 754 standard, which means they are stored as binary fractions. This is why 0.1 + 0.2 in Python gives 0.30000000000000004 rather than exactly 0.3. For most business calculations the error is negligible, but it matters in financial reconciliation — a topic Chapter 3 addresses.

In business analytics, variables naturally map to financial line items: revenue, cost, headcount, interest rate. Giving them meaningful names (gross_margin rather than gm) is not just style — it is documentation.

Interpretation: The output shows a profit of $250,000 and a gross margin of 25.0%. In retail, a 25% gross margin is modest — supermarkets run 20–30% while software companies exceed 70%. For a manufacturing firm, 25% would be healthy. Context determines whether a margin figure is a red flag or a green light. The f"..." syntax (an f-string, introduced in Python 3.6 via PEP 498) lets you embed variable values directly in text, with :, adding thousands separators and :.1% formatting a decimal as a percentage rounded to one decimal place.

Common pitfall: integer division

In Python 2, 250000 / 1000000 returned 0 because dividing two integers gave an integer result (truncated). In Python 3, the same expression returns 0.25 as expected. Always use Python 3. If you ever need integer (floor) division explicitly, use the // operator: 7 // 2 gives 3.

In practice

At JPMorgan’s equity research desk, analysts write Python scripts that pull quarterly financials for hundreds of companies and compute margin ratios automatically — exactly this calculation, scaled to thousands of tickers. The script that runs in 2 seconds in Python would require hours of manual spreadsheet work.

Rounding — round(value, ndigits)

Most business numbers don’t need 17 decimal places. round(value, ndigits) returns value rounded to ndigits digits after the decimal point. If you omit ndigits you get a whole number. Negative ndigits rounds to the left of the decimal — useful for “round to the nearest thousand.”

Two ways to control digits
  • round(x, n) — actually changes the value of x (returns a new rounded number).
  • f"{x:.2f}" — formats x for display only; the underlying variable is unchanged.

For final reports use f"..." (display rounding). Use round() when the rounded number itself will be reused — e.g. a price that must be quoted in cents, or a share count that must be a whole number.

Banker’s rounding (the .5 surprise)

Python uses banker’s rounding: 0.5 rounds toward the nearest even number, not always up.

This is the IEEE 754 default. Over thousands of rounded numbers it removes the upward bias that “always round half up” introduces — which is why it is the standard in banking and statistics. If you genuinely need “always round half up” (e.g. for a tax calculation that has to match a specific regulator’s rule), use math.ceil(x - 0.5) or decimal.Decimal with an explicit rounding mode.

Strings — Working with Text

Background

In computing, a string is a sequence of characters. Python stores strings as Unicode (specifically, UTF-8 compatible), which means a single string can contain English letters, Chinese characters (中文), Arabic script, mathematical symbols (∑, π), and emoji — all in the same variable. This is critical for global business: a Hong Kong company’s CRM might contain customer names in both Traditional Chinese and English, and Python handles both without special configuration.

String manipulation is one of the most common tasks in real-world analytics. Raw business data is almost never clean:

  • Company names have inconsistent capitalisation (“apple inc”, “Apple Inc.”, “APPLE INC”)
  • Dates arrive as text (“2024-01-15”, “Jan 15 2024”, “15/01/24”)
  • CSV files embed extra spaces or line breaks in fields

Before you can analyse textual data, you must clean it. Python’s built-in string methods — .strip(), .lower(), .upper(), .split(), .replace(), .startswith() — are the workhorses of that cleaning process.

f-strings (PEP 498, Python 3.6, 2017) are the modern way to embed variables inside text. Before f-strings, analysts used % formatting or .format() — both clunkier. An f-string prefixes the opening quote with f and uses {variable} placeholders, optionally followed by a format spec (:,.2f for comma-separated, 2-decimal-place float).

Interpretation: The first line outputs AAPL closed at $189.50 — a properly formatted market data label. .upper() ensures the ticker is always in canonical uppercase regardless of how it was stored. The .lower() result (hkust) shows how to normalise case for comparison or matching. .strip() removes the two leading and trailing spaces — invisible but fatal if you try to match " hello " against "hello". .split(",") converts a comma-separated string into a Python list of three elements: this is exactly how you parse a CSV row or a list of tickers received from an API.

Common pitfall: == vs is for strings

Use == to compare string values, and never use is. The expression "aapl" == "aapl" is always True. The expression "aapl" is "aapl" may or may not be True depending on Python’s internal string interning — an implementation detail you should never rely on. Reserve is for checking whether something is None.

In practice

Before importing customer records into a CRM or database, data engineers run a cleaning pipeline in Python. Typical steps: .strip() to remove whitespace, .title() to standardise capitalisation, .replace() to fix known typos, and a regex to validate email formats. This process — called data wrangling — consumes an estimated 60–80% of a data analyst’s time on real projects.

Boolean Logic

Background

Boolean logic is named after George Boole (1815–1864), an English mathematician who showed that logical reasoning could be expressed algebraically using only two values: True and False. His 1854 work An Investigation of the Laws of Thought laid the theoretical foundation for digital computing — every bit in your computer is a Boolean.

In Python, bool is a type with exactly two possible values: True and False. Boolean expressions are produced by comparison operators:

Operator Meaning Example
> greater than price > 50
< less than volume < 100
>= greater than or equal margin >= 0.20
<= less than or equal debt_ratio <= 0.5
== equal sector == "Tech"
!= not equal rating != "Junk"

Multiple Boolean expressions combine with and, or, and not. These map directly to SQL WHERE clauses — WHERE price > 50 AND volume > 1000000 is identical in meaning to price > 50 and volume > 1_000_000. If you have written database queries before, you already understand Boolean logic; Python just uses different syntax.

Interpretation: Both conditions are True — the price (100) exceeds 50, and the volume (5,000,000) exceeds 1,000,000. Consequently, both and (which requires all conditions to be True) and or (which requires at least one) return True. In a stock screening context, combining conditions with and narrows the candidate set (high-price and high-volume stocks), while or broadens it. Most quant screens apply multiple and filters to isolate a small, focused list of candidates.

Common pitfall: = vs ==

A single equals sign = is assignment — it stores a value. A double equals sign == is comparison — it tests equality and returns a Boolean. Writing if price = 100: is a syntax error. This is one of the most common mistakes for beginners coming from Excel or Stata where = does double duty.

In practice

Credit approval systems at banks evaluate hundreds of Boolean conditions in real time: credit_score >= 680 and debt_to_income <= 0.43 and employment_years >= 2. A loan application passes or fails based on the combined Boolean result. This logic — once coded manually by a team of analysts — runs in Python (or a Python-based rules engine) and processes millions of applications per day.

If / Elif / Else

Background

Conditional branching is how a program makes decisions. The conceptual ancestor is the decision tree — a flowchart that routes inputs to different outcomes depending on logical tests. Python’s if/elif/else structure is that flowchart expressed in code.

The structure reads naturally: If condition A is true, do X. Else if condition B is true, do Y. Otherwise, do Z. Python uses indentation (four spaces by convention) to delimit which code belongs to each branch — there are no curly braces {}. This is one of Python’s most distinctive design choices and makes code visually clean.

A few practical guidelines:

  • Order matters. Python evaluates conditions top to bottom and stops at the first True match. Put your most specific (narrowest) conditions first.
  • elif vs nested if. Use elif when the conditions are mutually exclusive (only one branch should fire). Use separate if statements when multiple conditions can independently apply.
  • Dictionary lookup as an alternative. For simple label mappings with many cases, a dictionary can be cleaner and faster than a long elif chain. But for range-based conditions (like the return thresholds below), if/elif is the right tool.

Interpretation: A daily return of 2.5% exceeds the first threshold (2%), so the signal is STRONG BUY. In practice, a 2.5% single-day gain is a significant move for a large-cap stock — the S&P 500 averages about 0.05% per day. A rule like this would need to be calibrated to the asset class and volatility regime; a 2.5% return is unremarkable for a cryptocurrency but extraordinary for a 10-year Treasury bond.

Try it: change daily_return to -0.03 and re-run. The else branch fires and signals SELL.

Common pitfall: forgetting indentation

Python is one of the few languages where indentation is syntactically required, not just a style choice. If you write the body of an if block without indenting it, Python raises an IndentationError. The standard is 4 spaces per level. Never mix tabs and spaces — it causes hard-to-debug errors.

In practice

Risk rating systems at insurance companies and rating agencies (Moody’s, S&P) apply exactly this branching logic: if a company’s interest coverage ratio exceeds 8, assign AAA; elif above 4, assign AA; and so on down to distressed ratings. The logic is simple; the challenge is computing the input ratios correctly from financial statements — which is where Python’s data-handling libraries come in.

For Loops

Background

A for loop repeats a block of code once for each item in a sequence. This is the fundamental mechanism of automation: instead of copying and pasting the same calculation 1,000 times, you write it once and let the loop handle the repetition.

From a computer-science perspective, a loop over a list of n items has O(n) time complexity — the work grows linearly with the number of items. If processing one item takes 1 millisecond, processing 1,000 takes ~1 second, and 1,000,000 takes ~17 minutes. For very large datasets you would replace Python loops with vectorised NumPy operations (which run at C speed) or pandas methods — but understanding loops is the prerequisite to understanding why vectorisation matters.

Python’s zip() function pairs up two sequences element-by-element, letting you iterate over multiple lists simultaneously without manual index management. It is one of Python’s most elegant built-in utilities.

Key loop patterns you will use repeatedly:

Pattern Purpose
for item in list: Iterate over items
for i, item in enumerate(list): Iterate with an index counter
for a, b in zip(list1, list2): Iterate over two lists in parallel
[expr for item in list] List comprehension (Chapter 1)

Interpretation: The loop prints a price table for four technology stocks — a portfolio snapshot. The total portfolio value (assuming 1 share of each) is $1,654. Notice that NVDA at $875 is the most expensive and dominates a fixed-share portfolio. A $1 investment in each would be different from 1 share in each — this is the difference between equal-weight and price-weight portfolio construction, a topic that recurs in Chapter 3. The :6s format spec in the f-string pads ticker to 6 characters, aligning the columns.

Common pitfall: modifying a list while looping over it

Never add or remove items from a list while you are iterating over it with a for loop. Python’s behaviour in this case is undefined and leads to silent bugs where items are skipped or processed twice. If you need to filter a list, build a new one using a list comprehension or filter(), then reassign.

In practice

A quantitative analyst processing end-of-day data for 500 S&P 500 components writes a loop that fetches each company’s financial ratios, applies a scoring model, and appends the result to a summary list. This nightly batch job — a loop over 500 tickers — might run in under a minute using pandas. Without Python, the same task would require a team of analysts working for hours.

Functions

Background

A function is a named, reusable block of code. The core principle is DRY — Don’t Repeat Yourself. If you find yourself writing the same calculation in three places, you should write it once as a function and call it three times.

Functions embody the modularity principle in software design: break a complex problem into small, well-named, independently testable pieces. Bertrand Meyer, the computer scientist who formulated Design by Contract, argued for command-query separation: a function should either do something (a command, with side effects) or answer something (a query, returning a value) — not both. In analytics, most functions are queries: they take data in and return a result.

The anatomy of a Python function:

def function_name(parameter1, parameter2):
    """Docstring: explain what the function does."""
    # body: the calculation
    return result

Key points: - def declares the function; indented code is its body - Parameters are local variables — they exist only inside the function - The docstring (triple-quoted string immediately after def) is machine-readable documentation; help(function_name) displays it - return sends a value back to the caller; without it, the function returns None

Interpretation: The function computes the net cost of buying a stock position after deducting a 0.1% brokerage commission. Buying 100 shares of AAPL at $189.50 costs $18,950 gross; after the $18.95 fee, the net value is $18,931.05. Buying 50 shares of NVDA at $875 costs $43,750 gross; net $43,706.25. The NVDA position is roughly 2.3× larger — relevant for thinking about portfolio concentration. The 0.1% fee is realistic for retail brokers; institutional traders often pay 0.01–0.03%.

Common pitfall: mutable default arguments

Never use a mutable object (list, dictionary) as a default argument value. The expression def f(data=[]): creates one list object at function definition time, shared across all calls that use the default. Instead, use def f(data=None): if data is None: data = []. This is one of the most notorious Python gotchas and affects even experienced programmers.

In practice

A reusable position_value() function is the building block of a portfolio management system. Write it once, test it thoroughly, and call it from your backtesting engine, your risk dashboard, and your order execution module. If the broker changes its fee structure from 0.1% to 0.05%, you update one line — not dozens of spreadsheet cells scattered across multiple files.

Try It Yourself

This exercise puts together everything from the chapter. You have a revenue of $1,000,000 and a cost of $750,000 — a 25% margin. The challenge is to reduce costs enough to push profit above $300,000 (a 30% margin).

This may look like a toy problem, but the underlying question — how much cost reduction is needed to hit a target margin? — is asked in every boardroom during budget season. The only difference at scale is that revenue and cost are computed from thousands of rows of data rather than entered manually.

Modify the cost to make profit > $300k:

Click ▶ Run after editing.

Hint

You need profit = revenue - cost > 300,000, which means cost < 700,000. Try cost = 690_000 and verify.

Key insight

Notice the ternary expression on the last line: "✅ Goal met!" if profit > 300_000 else "❌ Try again". This is Python’s one-line if/else — a compact way to choose between two values based on a condition. It is equivalent to a three-line if/else block and is widely used in data pipelines to create labels and flags.

Chapter Summary

This chapter introduced the seven foundational elements of Python that every business analytics practitioner uses daily. Here is a consolidated reference:

Key Takeaways

  • Variables and arithmetic: Python stores values in named variables. Use int for whole numbers and float for decimals. The f-string syntax (f"...") formats output readably. Watch out for floating-point precision in financial calculations.

  • Strings: Python 3 strings are Unicode by default, handling any human language. Core cleaning methods — .strip(), .lower(), .upper(), .split(), .replace() — are the foundation of data wrangling. F-strings (PEP 498) are the modern formatting standard.

  • Boolean logic: Named after George Boole (1854). Comparison operators (>, <, ==, !=, etc.) produce True/False. Combine with and, or, not. Directly equivalent to SQL WHERE clauses.

  • Conditional branching: if/elif/else routes program execution based on conditions. Conditions are evaluated top-to-bottom; only the first matching branch fires. Indentation is mandatory and meaningful in Python.

  • For loops: Repeat code over sequences. zip() iterates two lists in parallel. Time complexity is O(n) — for large datasets, switch to vectorised pandas/NumPy operations (Chapter 2).

  • Functions: Package logic for reuse following the DRY principle. Always write a docstring. Never use mutable objects as default arguments.

Operators Quick Reference

Category Operators
Arithmetic +, -, *, /, // (floor div), % (modulo), ** (power)
Comparison ==, !=, <, >, <=, >=
Logical and, or, not
Assignment =, +=, -=, *=, /=

What’s Next

You now know
  • Variables, arithmetic, f-strings
  • Booleans and if/elif/else
  • For loops and zip()
  • Functions with docstrings

Chapter 1 → Lists and list comprehensions, pandas Series, NumPy arrays, SciPy statistics, and your first plots with matplotlib. These tools scale everything you learned here from single values to entire datasets.

Suggested Further Reading

All of the following are free and beginner-friendly:

  1. Python official tutorial — docs.python.org/3/tutorial. The authoritative source. Chapters 3–5 cover everything in this chapter in more depth.
  2. “Automate the Boring Stuff with Python” by Al Sweigart — automatetheboringstuff.com. Free online. Practical, business-oriented examples.
  3. Real Python — realpython.com. High-quality tutorials with a professional slant. Articles on f-strings, loops, and functions are particularly good.
  4. “Python for Data Analysis” by Wes McKinney (O’Reilly) — the definitive pandas textbook, written by pandas’ creator. Chapter 2 covers Python fundamentals from an analytics perspective.
  5. PEP 20 — The Zen of Python — run import this in any Python interpreter to read the 19 guiding aphorisms. Understanding them will make you a better Python programmer.
Practice suggestion

Before moving to Chapter 1, try rewriting the position_value() function to also accept a commission rate as a parameter (instead of hardcoding 0.001), and use it in a loop over the four tickers from the For Loops section to print a complete position-value table. This single exercise touches every concept in this chapter.

Next →Python Essentials

📖 Back to Contents

 

Prof. Xuhu Wan · HKUST ISOM · Intro to Business Analytics