Logging Practices: Exception Handling and Message Formatting

Some content was generated with the assistance of Gemini 3 and edited. If you notice any mistakes, I would really appreciate you letting me know.

Log 的目的是「讓你快速修復 Bug」，那麼如何寫出可以更快找到並修復錯誤的 Log？以下整理了幾個常見的 Logging Practice。（雖然是基於 Python，但概念上不同語言之間大多是通用的）

Exception Handling

在 except 區塊中，必須保留完整的錯誤堆疊（Stack Trace）。

Do: `logger.exception` to keep stack trace

在 try...except 區塊內，建議使用 logger.exception("Message")。

（它等同於 logger.error(..., exc_info=True)，會自動將原本的 Traceback 附加在 Log 中。）

import logging
logger = logging.getLogger(__name__)

try:
    result = 1 / 0
except ZeroDivisionError:
    # ✅ 正確：Log 會包含 "Traceback (most recent call last)..."
    logger.exception("Calculation failed")

Don’t: Just using `logger.error`

使用 logger.error 只會印出你的訊息（例如 “Something went wrong”），看不到是哪一行程式碼出錯，導致除錯困難。

如果你堅持使用 logger.error，強烈建議打開 exc_info：

except Exception as e:
    # As same as `logger.exception`
    logger.error("Failed to process data", exc_info=True)

這通常用於你希望 Log 級別是 WARNING 或 CRITICAL 但仍想保留 Traceback 時（例如：logger.warning(…, exc_info=True)）。

Message Formatting

傳遞變數到 Log 訊息時，格式化的方式會影響效能與 Log 聚合系統（如 Sentry, Datadog）的分類能力。

Do: Lazy Formatting

logger.error("User %s failed", user_id)

優點：
1. 效能：只有當該 Log Level 真的被啟用時，才會進行字串運算。
2. 聚合 (Grouping)：Log 監控系統能識別 "User %s failed" 是一個模板，將其歸類為同一種錯誤，而不是成千上萬個不同的錯誤。

Don’t: F-string

logger.error(f"User {user_id} failed")

缺點：
1. 字串插值會立即執行（即使 Log 不會被輸出）。
2. 監控系統會以為每一條 Log 都是「全新的事件」，難以統計錯誤頻率。

Why not using F-string in log?

When using f-strings, log aggregation services (e.g., Sentry, Datadog, ELK) may treat each log message as a different error due to variable values in the message.

For example, suppose your system has 1,000 users who fail to log in within the same minute.

When using f-string, your code looks like:

logger.error(f"Login failed for user {user_id}")

Python will produced a whole string and pass to the log service.

Then, your log service may show:

Issue ID	Error Message (Group Name)	Count
#1	`Login failed for user 1`	1
#2	`Login failed for user 2`	1
…	…	…
#1000	`Login failed for user 1000`	1

Result:

Your issue list gets flooded, making it difficult to identify truly important new errors.
You also cannot set rules such as “send an alert if the same error occurs more than 50 times,” because from the system’s perspective, each error appears to have occurred only once.

But if we using lazy-formatting:

logger.error("Login failed for user %s", user_id)

Python sends the log message template (Login failed for user %s) and the arguments (101) to the logging system separately.

Issue ID	Error Message (Group Name)	Count
#1	`Login failed for user %s`	1000

You can clearly see that this error occurred 1,000 times, and you can drill down to view which specific user_ids were affected.

Provide context

Log messages should describe what was happening, not just what the error was. (The traceback already contains the error message.)

Bad: logger.exception(f"Error: {e}")
Good: logger.exception("Failed to upload file to S3 bucket")

This provides business logic context. When you read the logs, you immediately understand what the code was trying to do.

Last updated on 2025-12-16