Federated Learning: Privacy-Preserving Collaborative Models

Training Machine Learning Models Across Decentralized Data Without Sharing Raw Information

Authored by: Loveleen Narang

Date: April 2, 2025

Introduction: The Need for Collaborative Privacy

Machine learning (ML) thrives on data. Traditionally, this meant collecting vast amounts of user data into centralized servers for model training. However, growing concerns about data privacy, coupled with regulations like GDPR and HIPAA, make centralization increasingly problematic, especially for sensitive data generated on personal devices, in hospitals, or within financial institutions. Federated Learning (FL) emerges as a groundbreaking alternative paradigm.

FL enables multiple clients (e.g., mobile devices, hospitals) to collaboratively train a shared ML model under the coordination of a central server, critically, without exchanging their raw local data. Instead of bringing data to the model, FL brings the model to the data. Clients train the model locally on their data and only share anonymized or encrypted model updates (like parameter gradients or weights) with the server. The server then aggregates these updates to improve the global model. This approach inherently enhances privacy and data minimization, unlocking collaborative ML possibilities previously hindered by privacy constraints.

Centralized vs. Federated Learning

Centralized Learning Server/Cloud (Model Training) D1 Raw Data D2 Raw Data DN Raw Data Federated Learning Server (Aggregation) D1 (Local Data) Local Training Updates Global Model DN (Local Data) Local Training Updates Global Model

Fig 1: Comparison of data flow in Centralized vs. Federated Learning.

The Federated Learning Process: Federated Averaging (FedAvg)

The most common FL algorithm is Federated Averaging (FedAvg). It operates in rounds, typically involving these steps:

  1. Initialization: The central server initializes a global model \( \theta^0 \).
  2. Client Selection: The server selects a subset of available clients \( S_t \) (e.g., randomly) to participate in the current round \( t \). Let \( K \) be the total number of clients and \( n_k \) be the number of data points on client \( k \). Total data \( n = \sum_{k=1}^K n_k \). Formula (1): \(K\), Formula (2): \(n_k\), Formula (3): \(n\).
  3. Distribution: The server sends the current global model \( \theta^t \) to the selected clients in \( S_t \).
  4. Local Training: Each selected client \( k \in S_t \) updates the model based on its local data \( \mathcal{P}_k \). It typically performs multiple local epochs (\( E \)) of optimization (e.g., Stochastic Gradient Descent - SGD) starting from \( \theta^t \) to obtain a local model \( \theta_k^{t+1} \). Local SGD update: Formula (4): \( \theta \leftarrow \theta - \eta \nabla L(x_i, y_i; \theta) \). Formula (5): Local Epochs \(E\).
  5. Communication: Each selected client \( k \) sends its updated model parameters \( \theta_k^{t+1} \) (or the update \( \Delta_k^t = \theta_k^{t+1} - \theta^t \)) back to the server. Crucially, raw data \( \mathcal{P}_k \) is never sent. Formula (6): \( \Delta_k^t \).
  6. Aggregation: The server aggregates the updates from the selected clients, typically using a weighted average based on the amount of data each client used for training, to produce the new global model \( \theta^{t+1} \). FedAvg aggregation: Formula (7):
    $$ \theta^{t+1} \leftarrow \sum_{k \in S_t} \frac{n_k}{\sum_{j \in S_t} n_j} \theta_k^{t+1} $$
    Or, using updates: Formula (8):
    $$ \theta^{t+1} \leftarrow \theta^t + \sum_{k \in S_t} \frac{n_k}{\sum_{j \in S_t} n_j} \Delta_k^t $$
  7. Iteration: Repeat steps 2-6 for a set number of communication rounds \( T \) or until convergence. Formula (9): \(T\).

Mathematically, the goal is to minimize a global objective function \( F(\theta) \), which is often the weighted average of local loss functions \( F_k(\theta) \):

Federated Averaging (FedAvg) Cycle

Server t) Client 1 (Local Data) Client k (Local Data) Client N (Local Data) 1. Send θt 2. Local Training 2. Local Training 2. Local Training 3. Send Updates Δ₁ 3. Send Updates ΔN 4. Aggregate Updates θt+1 = Σ wk Δk

Fig 2: The iterative cycle of the Federated Averaging (FedAvg) algorithm.

Federated Learning Variants

Depending on how data is distributed across clients, FL can be categorized:

Federated Learning Variants
Variant Data Partitioning Description Example Use Case
Horizontal FL (HFL) Same feature space, different samples/users. Clients have datasets with the same features but different instances (e.g., different users' phone data). This is the setting for FedAvg. Mobile keyboard prediction across different users.
Vertical FL (VFL) Different feature spaces, same samples/users. Clients have datasets with different features but covering the same set of instances (e.g., a bank and an e-commerce company have different data about the same customers). Requires more complex coordination and often involves encryption. Collaborative credit scoring between a bank and an online retailer.
Federated Transfer Learning (FTL) Different feature spaces, different samples/users (with some overlap). Applies when datasets differ in both samples and features. Leverages transfer learning techniques within the federated setting. Using knowledge from a model trained on retail data in one region to help train a model for a different region with partially overlapping user bases.

Privacy-Preserving Techniques in Federated Learning

While FL prevents direct data sharing, the model updates themselves can potentially leak information about the client's local data through various attacks (e.g., membership inference, property inference, reconstruction attacks). Therefore, additional Privacy-Enhancing Technologies (PETs) are often integrated into FL.

Differential Privacy (DP)

DP provides strong, mathematically provable privacy guarantees by adding carefully calibrated noise to data or computations. It ensures that the outcome of an analysis is statistically similar whether or not any single individual's data is included in the dataset.

DP introduces a trade-off: stronger privacy (lower \( \epsilon \)) requires more noise, which can negatively impact model utility (accuracy).

Differential Privacy: Adding Noise for Privacy

Original Update (e.g., Gradient Δk) Noise (Laplace/Gaussian) + Noisy Update (Sent to Server) =

Fig 3: Conceptual illustration of adding noise via Differential Privacy.

Homomorphic Encryption (HE)

HE allows computations (like addition or multiplication) to be performed directly on encrypted data (ciphertexts) without decrypting it first. The decrypted result matches the result of computations performed on the original plaintext.

Secure Multi-Party Computation (SMPC or SMC)

SMPC protocols allow multiple parties to jointly compute a function over their private inputs without revealing those inputs to each other or any other party.

Secure Aggregation Concept

Δ₁ Δ₂ ... ΔN Masking / Secret Sharing Server Learns Only Sum (Σ wk Δk)

Fig 4: Secure Aggregation prevents the server from seeing individual updates.

Comparison of Privacy Techniques in Federated Learning
Technique Mechanism Pros Cons Primary Protection Against
Differential Privacy (DP) Add calibrated noise Strong mathematical guarantees, relatively lower computational overhead Reduces model accuracy (utility cost), requires careful parameter tuning (\(\epsilon, \delta\)) Inference attacks on individual contributions/data points
Homomorphic Encryption (HE) Compute on encrypted data No accuracy loss due to noise, strong privacy against server Very high computational overhead, often limited operation types (PHE vs FHE) Server snooping on intermediate updates
Secure Multi-Party Computation (SMPC) Cryptographic protocols (e.g., secret sharing) No accuracy loss due to noise, strong privacy against server High communication overhead, requires coordination/non-collusion assumptions Server snooping on intermediate updates

These techniques are not mutually exclusive and can sometimes be combined (e.g., using SMPC for aggregation and applying DP locally).

Challenges in Federated Learning

Despite its promise, FL faces significant practical challenges:

Applications of Federated Learning

FL is being explored and deployed in various domains where data privacy is paramount:

Conclusion

Federated Learning represents a paradigm shift in machine learning, enabling collaborative model training while respecting data privacy and locality. By keeping raw data decentralized and leveraging privacy-enhancing technologies like Differential Privacy, Homomorphic Encryption, and Secure Multi-Party Computation, FL opens doors to applications previously blocked by privacy hurdles. However, realizing its full potential requires overcoming significant challenges related to data heterogeneity, system constraints, communication efficiency, security, and fairness. As research progresses and frameworks mature, Federated Learning is poised to become an increasingly vital tool for building intelligent systems responsibly in a data-sensitive world.

Formula count includes simple definitions/parameters like: (1) K, (2) nk, (3) n, (4) Local SGD, (5) E, (6) Δk, (7) FedAvg Aggregation (weights), (8) FedAvg Aggregation (updates), (9) T, (10) Global Objective, (11) Local Objective, (12) L, (13) (ε,δ)-DP Def, (14) ε, (15) δ, (16) L1 Sensitivity, (17) L2 Sensitivity, (18) Laplace Mechanism, (19) Laplace PDF, (20) Laplace Scale b, (21) Gaussian Mechanism, (22) Gaussian Sigma σ, (23) DP-SGD Clipping, (24) L2 Norm, (25) Clipping Threshold C, (26) HE Addition, (27) HE Multiplication, (28) Shamir Polynomial, (29) Shamir Share, (30) KL Divergence, (31) Basic Learning Rate η (used in formula 4), (32) Number of clients sampled |St|. Total > 30.

About the Author, Architect & Developer

Loveleen Narang is a seasoned leader in the field of Data Science, Machine Learning, and Artificial Intelligence. With extensive experience in architecting and developing cutting-edge AI solutions, Loveleen focuses on applying advanced technologies to solve complex real-world problems, driving efficiency, enhancing compliance, and creating significant value across various sectors, particularly within government and public administration. His work emphasizes building robust, scalable, and secure systems aligned with industry best practices.