Generating Physically Relevant Synthetic Data For Battery Faults

Introduction

As battery systems mature and the use cases become more robust and widespread, there is a need to increase performance efficiency by improving algorithms for managing battery usage. The BMS software market is a fast-growing industry valued at 8 BUSD and projected to grow to 39 BUSD over the next decade. 

One big challenge while designing software for BMSs is testing: How do you test without access to data? Engineers are creatures of order and method, and they rejoice that engineering systems dish out repeatable and predictable data within an error tolerance. Therefore, the easy answer is, “Let’s generate synthetic data!”

Use Cases of Synthetic Data

Algorithm Training & Prediction: This technology enables the development and training of machine learning models for tasks such as state-of-charge (SoC) estimation, health prediction, and fault detection, especially when real-world data is limited or noisy.

Algorithm Testing & Validation: This provides a controlled and repeatable environment for rigorously testing algorithms under diverse scenarios and edge cases.

Verification of Manufacturer Claims: Helps verify cell manufacturers’ performance and life cycle claims by simulating extended use cases without needing long-term real-world testing.

Benchmarking & Model Development: Facilitates benchmarking of different battery models and accelerates development cycles by offering quick access to varied data conditions.

Process of Generation

Here is an example of 12S10P, 2.28kWh, 2 Wheeler Battery Pack. The pack has been constructed using LGM50 4.6Ah NMC cells.

Cell LG M50 21700 NMC
Nominal Capacity 5 Ah
Configuration 12S10P
Vmin 2.5 V
Vmax 4.2 V
Imax 1 C

In this data, we have introduced the following list of real-life faults:

FaultCause
Cell overheating
  • Increase in the Internal Resistance
  • Internal Short Circuit
  • External Short Circuit
  • Loose connection
  • Cooling system failure
Voltage Imbalance
  • Initial Capacity and Resistance variations in the cells
  • Degradation
  • Overheating of the isolated cells
Pack Over-discharge
  • Abuse
  • Power loss
Pack Over-charge
  • BMS fault
  • Cell overcharge due to imbalance
Loose connections
  • External Short Circuit
  • Vibrations
  • Improper connection
Sensor Faults
  • Signal loss
  • Vibration causing a loose connection
 Sudden Drop in Capacity (Knee- behaviour)
  • Nearing End of Life
  • Battery Pack Fault

The above faults were introduced across different packs and time periods. These faults were either injected in isolation or in varying combinations, enabling us to simulate diverse degradation scenarios observed in real-world field data.

Data Generation Approach

To generate pack degradation responses, multiple types of cells were first created by introducing various faults, as previously described. Subsequently, 30 battery packs—each comprising 120 cells—were assembled using randomly sampled data from a synthetically generated set of 28 unique cell profiles. Data generation was carried out in parallel ( ten models at a time). The entire process of generating the data and assembling the packs took about 3.5 days of dedicated time. The key steps involved in this data generation process are:

Sr. No Step Details Time Required (approx)

1

Identify the synthetic Data DoE to represent Cells with “physical” anomalies

1 Day

2 Model setup, sanity checks and Computation Time (~ 1.5 hrs per Cell) 1 Day
3 Synthetic Data Analysis (before assembling packs) 1 Day

4

30 Pack Assembly: Random sampling of cells

 0.5 Day
Results

Due to the faults introduced across different timelines and combinations, the synthetic packs exhibited a wide spectrum of degradation trajectories. Some packs experienced a gradual decline in SoH, while others showed an early onset of accelerated degradation, particularly those exposed to compounding stressors like overcharge combined with imbalance or overheating.

Figure 1 shows the SoH evolution of all battery packs over time. Although most started near 100% in October 2023, clear divergence is visible by mid-2024. Packs with multiple faults begin to separate from the healthier group, demonstrating the cascading effects of faults when left undetected or unmitigated. This divergence becomes even more pronounced as degradation accelerates toward late 2024 and early 2025.

Figure 1: SoH Vs Time
Figure 2: SoH R Vs Time for a Representative Pack

Figure 2 zooms in on a representative pack that underwent a combination of pack overcharge, voltage imbalance, and knee to highlight the impact of severe fault stacking. The degradation observed in this pack underscores how cumulative faults can lead to nonlinear degradation and sudden drops in SoH, which would be critical for diagnostic systems to catch in real applications.

Figure 3: Percentage Voltage Variation Vs Time

Figure 3 illustrates the evolution of voltage difference percentage over time for two representative packs—one with induced imbalance and one without. While both packs start with minimal deviation, the pack with imbalance (shown in red) quickly diverges around late December 2023. A pronounced spike exceeding 60% marks a moment of severe imbalance, after which the voltage spread stabilizes but remains consistently higher than the healthy pack.

This persistent voltage mismatch forces specific cells to operate outside optimal conditions, contributing to local overcharge or deep discharge, and both accelerating capacity fades. By contrast, the balanced pack (green line) maintains a lower and relatively stable voltage difference, highlighting the benefit of uniform cell behavior in extending battery life.

This plot reinforces how early-stage imbalance, even if initially subtle, can escalate and become a critical driver of long-term degradation. These findings validate the necessity of continuous monitoring and early correction in battery management systems.

Get in touch with us if you would like to explore how you can generate high-fidelity synthetic data using the oorja suite of apps.