AI-POWERED VIBRATION SENSOR: INTELLIGENCE IN EVERY BEAT
The “Downtime” Nightmare: When the Clock Turns in Dollars

In the fast-paced world of Industry 4.0, every moment of production line downtime is not just the silence of machines, but an uncontrolled “leak” of budget.
- Double damage: You not only lose revenue from reduced production, but also face expensive emergency repair costs and the risk of order delays, eroding your reputation with partners.
- The numbers speak for themselves: According to recent industry reports, the average cost for one hour of unplanned machine downtime can reach tens, even hundreds of thousands of USD depending on the scale.
How to Calculate Your Downtime Costs
To understand why investing in a monitoring system early is crucial, let’s do a practical calculation of what your business actually loses when machinery is down. Before we begin, list the following key parameters:
- Total planned operating time: The number of hours the machine is scheduled to operate (e.g., an 8-hour shift/day).
- Average weekly output: The total number of products the machine produces in a normal week.
- Gross profit per unit: The profit earned for each finished product.
Damage calculation formula
The process for calculating actual damages is established through 3 steps:
- Determine the actual machine downtime: Planned time – Actual running time = Downtime hours
- Calculate the lost productivity:
(Total weekly output) / (Planned time) = Production rate per hour
Downtime hours x Rate per hour = Total product shortfall - Determine the total financial damage: Total Product Shortage x Gross Profit/Product = Total Gross Loss
Practical examples
Suppose a piece of equipment on your production line malfunctions and has to be shut down for 3 days.
- Context: The machine operates 8 hours/day (40 hours/week). Average output: 10,000 units/week.
- Efficiency: Production rate is $10,000 / 40 = $250 products/hour.
- Disruption: 3 days of downtime is equivalent to 24 hours of downtime.
| Item | Calculation | Result |
| Lost Production Volume | 24 hours $\times$ 250 units | 6,000 units |
| Financial Damage (Assuming $6 profit/unit) | 6,000 units $\times$ $6 | $36,000 |
The Verdict: In just 3 days of downtime, your business loses $36,000 in gross profit. Note that this figure excludes emergency repair costs, overtime pay for technicians, and potential late-delivery penalties.
This AI-powered solution safeguards your $36,000 investment by detecting anomalies and preventing costly failures before they occur.
Remember: this is just your lost profit. Beyond the expensive professional repairs, the true cost lies in wasted time. It’s time to move past passive monitoring. This solution is more than a measuring device—it’s a revolution in Edge AI. The breakthrough lies in its “brain”: instead of overloading servers with raw data, it analyzes and detects anomalies directly at the source.
- Instantaneous processing: Detects abnormal vibration and temperature readings in milliseconds and issues warnings before problems develop.
- Intelligent data filtering: Only the most important information is transmitted, optimizing network infrastructure and saving operating costs.
- From sensors to experts: With our device, you’re not just installing a device, you’re placing a “24/7 monitoring expert” directly on each machine axis, each “engine”.

Industrial IoT Sensor Node & Rugged Enclosure
Mastering technology: The combination of AI & Qualcomm connectivity

Engineering Research & Development (R&D) Prototype Testing
This solution is more than just a piece of hardware; it is a symbol of the Vietnamese engineering team’s mastery of core technologies. The product represents the culmination of world-leading components and practical, problem-solving thinking tailored for factory environments:
- Edge AI Anomaly Detection – Intelligence at the Edge: The core difference lies in the machine learning algorithms, optimized to run directly on the device’s processor. Instead of passively sending raw data to the Cloud, our solution can “think for itself” and diagnose potential malfunctions on-site. This completely eliminates network latency, ensuring critical alerts are issued instantly before serious incidents occur.
- Qualcomm Super Connectivity – Penetrating All Barriers: In heavy industrial environments where dense metal machinery often causes signal interference, the device fully leverages the strengths of Qualcomm’s antenna technology on its gateway system. The result is ultra-stable connectivity and excellent penetration through obstacles, ensuring a continuous flow of information even in the harshest environments.

Edge AI Controller Board with Qualcomm Connectivity
- Low-power design: This product is the perfect combination of high-end hardware from Analog Devices and the sophisticated programming techniques of our Vietnamese engineering team. By optimizing each processing command and the deep sleep mode of the integrated circuit, the device can operate reliably for many years with only a single battery replacement, helping businesses eliminate concerns regarding regular system maintenance costs.
- Wireless Deployment: With its completely wireless design and intelligent installation structure, our solution redefines the concept of “industrial installation.” There is no need for complex wiring or infrastructure disruption; deploying hundreds of monitoring points across a wide area can be completed in just hours instead of weeks, as required by traditional solutions.
ThingIQ Platform: Optimizing the Orchestration Layer for the Edge AI Ecosystem
If hardware devices act as the Perception (Sensor) and Edge Computing layers, then ThingIQ is the ultimate Orchestration layer. The platform is designed to fully leverage the technical characteristics of strategic hardware partners:
- Physical Layer & Connectivity Management: Leveraging the advantages of Qualcomm chipsets, ThingIQ provides deep monitoring of telecommunications parameters. The system allows flexible configuration of Data Transmission Duty Cycle and Notification Heartbeat, balancing Real-time Latency and Power Consumption based on the signal status at the factory.
- Signal Conditioning: Raw data from power management ICs and sensors of Analog Devices (such as LDO voltage, Fuel-level voltage range) is processed by ThingIQ through standardization algorithms. Users can set Calibration Thresholds directly on the Dashboard to precisely define “Full/Empty” or “Normal/Abnormal Consumption” states, eliminating signal noise before analysis.

Real-Time Accelerometer Data (X/Y/Z)
- Firmware Operation and Maintenance (FOTA Ecosystem): This is the most important bridge to Edge AI Nodes. ThingIQ manages the entire firmware lifecycle via the FOTA (Firmware Over-The-Air) protocol. With specialized builds (such as version v1.0.3-sf4), administrators can push optimized Inference Models (AI inference models) from the Cloud to edge processors without system disruption. This ensures that Anomaly Detection capabilities are always up-to-date with the latest datasets.
- Massive IoT Management: With a database structure supporting thousands of network nodes (e.g., 1,457 physical records), ThingIQ supports Group Policy Management. Users can group devices by Device Type (Gateway, GPSS, Vibration) or Company/Project to apply consistent configuration profiles, making scaling up the system from a few nodes to thousands of nodes technically feasible and cost-effective.

Industrial AI Model Training Dashboard
Contact Us
Don’t let Downtime interrupt your cash flow.
👉 Contact us today for a professional consultation and a live demo at your facility!
Industrial Embedded Solutions Joint Stock Company (IES)
Our mission is to enhance business value by providing effective solutions and professional software applications that meet the rigorous demands of both Vietnamese and international enterprises.
- Hotline 24/7: +84 90 686 2311 | +84 77 413 5678
- Email: [email protected]
- Address: 7A Thoai Ngoc Hau, Hoa Thanh Ward, Tan Phu District, Ho Chi Minh City, Vietnam
- LinkedIn: Industrial Embedded Solutions JSC
Core Services:
- Embedded Systems, Firmware, and Software Development.
- IT Outsourcing & Automotive Technologies.
- Innovative Industrial IoT Solutions.
👉Ready to optimize your operations? Connect with our engineering team now!👈
STM32MP1 NAND Boot Not Working?
Why can’t the STM32MP1 boot from NAND Flash?
1. Layered Boot Chain Mechanism and the Role of ROM Code
1.1. Consistency of STM32 Image Header & Magic Number
1.2. Handshaking Protocol Between FMC and NAND (ONFI Compliance)
1.3. Bad Block Management (BBM) Mechanism
2. Check the Boot Pins (Hardware Strapping) configuration
2.1. Lookup table for Boot Mode STM32MP1.
2.2. How to check the voltage at the BOOT pin[2:0].
3. Compatibility Between ROM Code and ONFI Standard
3.1. How to Determine if a NAND Chip Supports ONFI
3.2. Handling Bus Width Errors (8-bit vs 16-bit) in Hardware
4. File Formatting Errors: The Importance of the STM32 Header
5. Configuring the Device Tree (DTS) for FMC and NAND
5.1. Setting the standard nand-ecc-mode and nand-ecc-strength according to the Datasheet
5.2. Optimizing FMC Clock Speed to Avoid Data Corruption
6. Analyzing Error Codes via UART Console
6.1.Connecting UART4 for ROM Code Debugging
6.2. Decoding Common Boot Failure Hex Codes
7. Checking for Bad Blocks and Flash Layout
7.1. Redundancy of FSBL on NAND
7.2. How to reload Flash Layout using STM32CubeProgrammer via DFU mode
Summary of the Quick Troubleshooting Checklist
Why can’t the STM32MP1 boot from NAND Flash?
To fix the “Silent Boot” error (no console response), we need to delve into the boot chain mechanism of the STM32MP1 system. This process is not simply about reading data; it’s a coordinated sequence between hardware (FMC) and a tightly structured data set defined by the ROM code.
1. Layered Boot Chain Mechanism and the Role of ROM Code

When the system exits the Reset state, the first program to launch is Internal ROM Code. This is immutable source code embedded in the SoC. Its core task is to initialize the FMC (Flexible Memory Controller) to locate and load the FSBL (First Stage Boot Loader) — typically TF-A (Trusted Firmware-A) or U-Boot SPL — into the internal RAM (SYSRAM).
Problems often occur when the authentication chain or data loading process is interrupted at one of the following links:
1.1. Consistency of STM32 Image Header & Magic Number
ROM Code is an extremely strict state machine. It does not execute raw binary files (.bin) because it cannot determine the entry point and data integrity.
- Header Structure (256-byte): Contains important metadata including Image Length, Payload Checksum, and EntryPoint Address (the jump address in SYSRAM).
- Magic Number Identifier: ROM Code scans the blocks to find the code 0x324D5453 (equivalent to the ASCII string “STM3”).
- Consequential Error: If the loaded file is missing this header or the header is offset, ROM Code will return the error “No valid boot device found”. This is why you must use a .stm32 file (processed using mkimage or STM32CubeProgrammer).
1.2. Handshaking Protocol Between FMC and NAND (ONFI Compliance)
Physical layer errors are often caused by incompatibility between the FMC (Flexible Memory Controller) and the memory chip.
- Automatic ONFI detection: The STM32MP1 prioritizes the ONFI (Open NAND Flash Interface) standard for querying operating parameters such as page size, block size, and spare area size. Non-ONFI risk: If the NAND chip does not support ONFI, the ROM code is forced to use default parameters (FMC default timings). If there is a discrepancy in bus width (8-bit vs 16-bit), the read data will be corrupted.
- ECC Mismatch mechanism: This is the “silent killer.” The STM32MP1 uses hardware ECC (BCH4 or BCH8). Configuration error: If the programmer writes data using an ECC algorithm different from the algorithm the ROM Code uses for reading, the checksum will fail and the First Stage Boot Loader (FSBL) will be immediately aborted.
1.3. Bad Block Management (BBM) Mechanism
Unlike stable storage media such as SD Cards or eMMCs, NAND Flash allows for the existence of physical errors (Bad Blocks).
- Redundancy Scan: The ROM Code STM32MP1 will scan at least the first 128 KB to find valid headers.
- Skip-Block Mechanism: When encountering a Factory Bad Block (marked by the manufacturer), the ROM Code will automatically skip it and jump to the next block in the redundancy list.
- Programming Tool Logic Errors: A common technical error is that the programming tool fails to recognize the Bad Block or programs the wrong offset relative to the partition table (Flash Layout). If the FSBL is overwritten on a faulty block without being remapped, the ROM Code will be unable to initialize the Boot Chain.
2. Check the Boot Pins (Hardware Strapping) configuration.
2.1. Lookup table for Boot Mode STM32MP1.

The STM32MP1 boot mode is defined by the combination of several inputs:
- Three boot pins, accessible on ST boards: their possible values are shown in the first column of the table;
- The next column corresponds to the TAMP backup register number 20, that allows the user to force a serial boot when it is set to 0xFF from U-Boot or Linux;
- The one time programmable WORD 3 contains a primary boot source and a secondary boot source, shown in the third and fourth columns, respectively. The possible values for the boot sources are listed in the tables on the right: parallel NAND Flash, QUADSPI NOR Flash, eMMC, SD Card or QUADSPI NAND flash.
The boot pins have two special positions:
- All pins at zero forces a boot in serial mode
- Binary value 100 allows to enter in no boot mode, useful to take the hand on the coprocessor via JTAG for 7 firmware development without Linux.
2.2. How to check the voltage at the BOOT pin[2:0].

Configure Hardware Strapping using a 10kOhm resistor and Dip-Switch to select the boot source (NAND/SD/USB).
Sometimes flipping a switch or soldering resistors doesn’t guarantee the correct logic level due to noise or voltage drop. For accurate debugging, you need to follow these steps:
- Static Voltage Check: Use a multimeter to measure the voltage directly at the test points of the BOOT0, BOOT1, and BOOT2 pins while the board is powered on. High level ≥ 0.7 VDD vs Low level ≤ 0.3 VDD.
- Check for interference (Oscilloscope): The ROM Code only latches the values of the BOOT pins at the rising edge of the NRST signal. If the power supply is slow or the BOOT pin has an excessively large filter capacitor causing delay, the ROM Code may read an incorrect value. Use an oscilloscope to ensure the logic level is stable before releasing NRST.
- Determine the pull-up/pull-down resistance: If you are using resistors to fix logic levels, ensure the resistance value is between 10kΩ and 47kΩ. Avoid using excessively high values (such as 100kΩ) as this can cause logic level deviations due to I/O leakage current.
3. Compatibility Between ROM Code and ONFI Standard
One of the common reasons why the STM32MP1 cannot boot from NAND is the language difference between the ROM code and the flash chip. Unlike older microcontrollers that require hardcoded NAND parameters, the STM32MP1 prioritizes the use of an auto-recognition protocol.
3.1. How to Determine if a NAND Chip Supports ONFI
ONFI (Open NAND Flash Interface) is a standard that allows SoCs to query the technical specifications of NAND chips (such as block count, page size, spare area length) via the 0xEC command.
- Check the datasheet: Search for the keyword “ONFI” in the NAND chip’s technical documentation. Popular chip lines from Micron, Winbond, or Macronix usually support this standard.
- How does the ROM Code handle this?
During boot-up, the ROM Code sends the Read ID (0x90) and Read Parameter Page (0xEC) commands.
If the chip responds with the string “O-N-F-I”, the ROM Code will automatically configure the appropriate FMC controller for that chip.
- If the NAND does not support ONFI (Non-ONFI): The ROM Code will try based on the static ID table (Hardcoded IDs). If your chip is too new or too specific and not on ST’s supported list, the ROM Code won’t know how to read the data, leading to an immediate freeze.
3.2. Handling Bus Width Errors (8-bit vs 16-bit) in Hardware
Discrepancies in data bus width cause Data Mismatch errors, leading the ROM Code to incorrectly read the Bootloader Header.
- STM32MP1 Default: ROM Code initializes the FMC controller in 8-bit mode for backward compatibility.
- Using 16-bit NAND: If you are using a 16-bit NAND chip, the FMC_NIORDY pin (or some specific pin configuration) must be handled correctly. Most importantly, the STM32MP1 ROM Code only supports booting from 8-bit NAND. If your hardware design uses a 16-bit bus for the Boot partition, the system will not be able to boot (unless the NAND chip has an automatic 8-bit switching mode upon receiving a command).
- Physical connection check: Ensure that the signal lines from FMC_D0 to FMC_D7 are not short-circuited or open-circuited. Transmission impedance: With high access speeds, the data bus lines need to be of similar length to avoid signal skew.
4. File Formatting Errors: The Importance of the STM32 Header
A common mistake made by engineers new to the STM32MP1 family is directly loading raw binary files (.bin) onto NAND Flash. In reality, the STM32MP1’s ROM code doesn’t automatically understand where the binary file starts. It requires a technical “wrapper” surrounding the actual data, called the STM32 Image Header. If the loaded file lacks this header, the system will treat it as garbage and completely ignore it.
What is Magic Number 0x324D5453 Structure?
Each boot file (TF-A, U-Boot SPL) loaded onto NAND must begin with a 256-byte header. The most important component in this header is the Magic Number.

- Value: 0x324D5453 (When read in Little-endian format, it corresponds to the ASCII character string: “S-T-M-3”).
- Role: This is the “key” to unlock the ROM Code. When scanning through the Blocks on the NAND, the ROM Code only searches for this number. If the first 4 bytes of the Block do not match 0x324D5453, the ROM Code will immediately jump to the next Block.
- Other components in the Header: Image Signature, Image Length, Entry Point
When you compile with Yocto or Buildroot, the script will call the mkimage tool to package the u-boot-spl.bin file into u-boot-spl.stm32. This .stm32 extension is the indicator that the Header has been inserted. Otherwise you can use STM32CubeProgrammer to check Header:
- You can open the .stm32 file using Hex Editor software (such as HxD). If you see the first 4 bytes as 53 4D 54 32 (corresponding to “STM3”), your file has a standard header.
- When you connect the board in USB DFU mode, load the file into the corresponding partition: If you select a file without a header, the tool will immediately warn of a formatting error or the boot process will fail with the error log: “Header Not Found”. Using Flash Layout: The .tsv (Flash Layout) file in STM32CubeProgrammer will define partitions such as fsbl1, fsbl2. This tool automatically checks the integrity of the header before pushing data to the NAND.
5. Configuring the Device Tree (DTS) for FMC and NAND
After the ROM Code has successfully loaded the FSBL, the next stage (U-Boot and Kernel) depends entirely on the data structure in the Device Tree (.dts). If the parameters here do not match the physical characteristics of the NAND chip, the system will freeze when trying to mount the data partition or report an “ECC uncorrectable error”.
5.1. Setting the standard nand-ecc-mode and nand-ecc-strength according to the Datasheet
To ensure data integrity, you need to open the datasheet of your NAND chip and find the “ECC Requirement” section.
- nand-ecc-mode: For STM32MP1, this value is usually “hw” (using hardware FMC acceleration). If your NAND chip has built-in error correction, use “on-die”.
- nand-ecc-strength: The number of error bits the controller is capable of correcting per data area (Step size).
Example: If the datasheet requires “8-bit ECC per 512 bytes”, set nand-ecc-strength = <8>. Note: STM32MP1 supports different strength levels (BCH4, BCH8). If the NAND chip requires 8-bit and you only configure 4-bit, the system will run unstably and quickly suffer from file corruption.

Mẫu cấu hình Device Tree chuẩn
5.2. Optimizing FMC Clock Speed to Avoid Data Corruption
Excessively high FMC control clock speeds cause random bit flips. The FMC controller must be configured with timings that match the NAND chip’s access speed.
- EBI Timings: Parameters such as tset, twait, and hold in the Device Tree define the minimum time for the data signal to stabilize before processing by the chip.
- Check Clock Speed: If you see many unidentified read errors in the U-Boot log, try reducing the FMC bus clock speed by adjusting it in the st,fmc-control button or reconfiguring the Clock Tree in the system’s .dts file.

6. Analyzing Error Codes via UART Console
When your STM32MP1 remains silent, the Internal ROM Code provides a “last-resort” diagnostic tool. Even before the first line of your code executes, the ROM Code can output specific error status characters to help you pinpoint exactly where the boot process failed.
6.1.Connecting UART4 for ROM Code Debugging
In STMicroelectronics reference designs (such as the Discovery or Eval boards), UART4 is the dedicated default console for ROM Code and bootloader debugging.
Default Pin Assignment:
- UART4_TX: Pin PG11 (Usually routed through an onboard ST-LINK or USB-to-UART bridge).
- UART4_RX: Pin PB2.
Terminal Configuration:
- Baud rate: 115200
- Data bits: 8
- Parity: None
- Stop bits: 1
- Flow Control: None
6.2. Decoding Common Boot Failure Hex Codes
If the NAND boot fails, the ROM Code emits an error character as defined in the AN5031 technical application note. Below is a breakdown of the most common codes encountered during NAND debugging:
| Hex Code | Technical Meaning (Error Meaning) | Troubleshooting / Solution |
|---|---|---|
| 0x61 | No Valid Header Found | The ROM Code scanned the entire NAND but did not find the Magic Number 0x324D5453. Check the .stm32 header file again. |
| 0x62 | Invalid Image Checksum | The header was found, but the internal data is corrupted or the ECC does not match. Verify the ECC configuration used during flashing. |
| 0x63 | Device Timeout / Not Ready | The NAND chip does not respond to read commands. Check the power supply (1.8V/3.3V) and the Ready/Busy (R/B) pin. |
| 0x64 | NAND ID Not Supported | The ROM Code can read the chip ID but does not support it (usually because it is not ONFI compliant). |
| 0x65 | Authentication Failed | This occurs when Secure Boot is enabled but the digital signature is invalid. |
7. Checking for Bad Blocks and Flash Layout
Unlike other memory types such as eMMC or SD Card, NAND Flash always comes with faulty blocks (bad blocks) from the factory. If the FSBL (TF-A) accidentally lands on a bad block without a redundancy mechanism, the STM32MP1 will never boot.
7.1. Redundancy of FSBL on NAND
The ROM code of the STM32MP1 is extremely intelligently designed to deal with bad blocks through a redundancy mechanism.
- FSBL Mirroring: Typically, we don’t just load a single FSBL. The system usually requires at least 2 to 5 copies of the FSBL (denoted as fsbl1, fsbl2, fsbl3…) located in the first blocks of the NAND.
- ROM Code Scanning Mechanism: 1. ROM Code starts scanning from Block 0. 2. If Block 0 is corrupted or lacks a valid STM32 header, it automatically jumps to the next block (usually Block 1 or the next offset depending on the configuration). 3. This process repeats until a complete FSBL is found.
- Offset Location: Typically, FSBLs are located at fixed positions (e.g., 0x00000000, 0x00040000, 0x00080000…). If you only load an fsbl1 into Block 0 and that block fails, the system will freeze.
7.2. How to reload Flash Layout using STM32CubeProgrammer via DFU mode
When NAND boot fails completely, the most thorough solution is to reload the entire partition structure via USB DFU (Device Firmware Update) mode.
Step 1: Firstly, switch to Serial Boot mode
Set the BOOT pins to level 000 (as instructed in section 1) and connect the board to the computer via USB OTG port.
Step 2: Next, prepare the Flash Layout file (.tsv)
The .tsv file is a “map” defining the location of each component on the NAND. A standard NAND file usually looks like this:
Note: The Offset column must match the Erase Size architecture of the NAND chip you are using.
Step 3: Finally, flashing the file
- Open STM32CubeProgrammer.
- Select the USB connection and click Connect.
- Switch to the Erasing & Programming tab and select the prepared .tsv file.
- Click Download.
- The tool will automatically remove bad blocks, calculate ECC values, and flash headers for each partition.
- If you encounter a Partition overlap error, check the file size against the offset.
Summary of the Quick Troubleshooting Checklist
| Category | Checklist Item | Status | Technical Notes |
|---|---|---|---|
| 1. Hardware | Boot Pins Strapping | [ ] | Ensure BOOT[2:0] = 010 (NAND Boot mode). |
| Voltage Levels | [ ] | Measure BOOT pins: High ≥ 0.7 VDD, Low ≤ 0.3 VDD. | |
| Bus Width | [ ] | ROM Code supports 8-bit NAND by default. | |
| Ready/Busy (R/B) | [ ] | Check pull-up resistor and connection to the SoC. | |
| 2. Image Header | Magic Number | [ ] | The first 4 bytes of the file must be 53 4D 54 32 (“STM3”). |
| File Extension | [ ] | Use .stm32 file, not a raw .bin file. | |
| Entry Point | [ ] | Must match the mapped address in SYSRAM. | |
| 3. NAND & ECC | ONFI Support | [ ] | Confirm the NAND supports ONFI or is in ST’s supported list. |
| ECC Strength | [ ] | Match nand-ecc-strength in DTS with the datasheet (4-bit / 8-bit). | |
| ECC Mode | [ ] | If using On-die ECC, disable FMC Hardware ECC. | |
| 4. Software / DTS | FSBL Redundancy | [ ] | Flash at least two copies (fsbl1, fsbl2) at the correct offsets. |
| TSV Partition | [ ] | Check the .tsv file to ensure no partition overlap. | |
| FMC Timings | [ ] | Increase tset and twait in DTS if random data errors occur. | |
| 5. Diagnostics | UART4 Log | [ ] | 115200, 8N1. Check Hex error codes (0x61, 0x62, 0x63). |
| USB DFU Mode | [ ] | Set BOOT to 000 to test connection via STM32CubeProg. |
📞 Technical Support & Consulting
Struggling with complex timing issues or custom hardware integration? We are here to help you accelerate your time-to-market.
Contact our embedded engineers at IES (Industrial Embedded Solutions) with email address [email protected] for expert hardware design review, custom bootloader development, and Linux kernel optimization.
STM32MP1 Custom Board Bring-Up and Flashing Procedure
Bring-up Board STM32MP1
1. Bring-up steps
2. DDR Tuning & Stress Test
The Flashing Chain
Check Verify Issue
Bring-up Board STM32MP1
1. Bring-up steps
Step 1: Check the hardware baseline.

Before powering on, ensure that the basic physical parameters do not cause a short circuit.
- Check the power supply: Use a multimeter (VOM) to measure the impedance of the main power lines (V_DDCORE, V_DD\_DDR, V_DD).
- Power on: Observe the board’s current consumption. If the current spikes (High current), disconnect the power immediately.
- Check the PMIC: Measure the output voltage at the capacitors surrounding the PMIC (STPMIC1). Ensure the voltage is 1.2V for the Core and 1.35V or 1.5V for the DDR.
Step 2: Set up Boot Pins (Boot Configuration)
1. Principles of Interfering with BootROM Logic
Each microprocessor, upon leaving the factory, has a fixed, unchangeable piece of code called BootROM. When power is applied, BootROM scans the voltage state (Logic High/Low) on dedicated configuration pins such as BOOT0, BOOT1, and BOOT2.
- Mechanism: Flipping the switches (DIP switches) or changing the pull-up/pull-down resistors sends an “encoding signal” to the microprocessor.
- Purpose: To instruct the chip to skip searching for software in the internal storage memory (which is usually empty) and go straight to waiting for commands from external communication ports.
2. USB OTG Priority Mode (DFU Mode)
For the STM32MP1 series, the binary configuration is usually set to 000 (or according to the manufacturer’s specific reference diagram) to prioritize USB OTG (Device) mode.
- DFU (Device Firmware Upgrade): In this mode, the processor acts as a slave device. As soon as it connects to the Host PC via a USB Type-C cable, the BootROM will initialize a minimal USB stack so that the computer can recognize the device as a “USB DFU Device”. This is the state that allows the STM32CubeProgrammer tool to “see” the chip and be ready to transfer data.
3. Physical Bring-up Chain Diagram
To ensure stable and reliable data transmission, the bring-up chain diagram must adhere to the following structure:

In this connection chain, the USB_OTG port acts as the sole gateway. Correctly configuring the BOOT pins “opens” this gateway, allowing data from the computer to go directly to the lowest hardware layer of the CPU without passing through any running operating system.
Step 3: Check the connection with the Host PC (DFU Mode)
When you connect the device to the computer and select the USB interface in the STM32CubeProgrammer tool, a digital “handshake” process takes place. If the BOOT pin configuration in the previous step was correct, the chip’s BootROM will send identification parameters (Vendor ID and Product ID) to the computer. When you press Connect, the software will query the chip’s unique Serial Number and display the “Connected” status.
Technical Significance of a Successful Connection
A stable connection is practical proof that the Minimum Operating Conditions have been met:
- CPU “Live”: The main logic block of the microprocessor is active and executing code from the BootROM.
- Power System (PMIC/LDO): The core power lines (VDD, VDDCORE, VDD_USB) are providing stable voltage, without voltage drops or interference.
- Oscillator (Clocking): The quartz crystal (usually 24MHz) is oscillating accurately, allowing the frequency multiplier (PLL) inside the chip to generate the necessary clock pulse for the USB controller.
Analyzing the Cause of Connection Failure
If the computer reports “USB Device Not Recognized” or the device is not found, this is a warning signal of a physical error on the hardware layer:
- Clock Error: If the 24MHz quartz crystal is not working or is at the wrong frequency, the USB controller will be unable to synchronize data with the Host PC, resulting in the device being “invisible” to the software.
- Signal Integrity: The USB D+/D- differential signal pair needs to be checked. Even a small design flaw (such as unbalanced 90 Ohm impedance) or a mechanical contact error will interrupt the data flow.
- Boot Status: The CPU may still be stuck in Flash boot mode instead of entering Load Mode (DFU), requiring a check of the logic levels on the BOOT0/1/2 pins.
2. DDR Tuning & Stress Test
During the DDR bring-up process, simply checking if the RAM is fully recognized isn’t enough. We need to observe the Eye Diagram.
- If the ‘eye’ is wide and clean: The signal is extremely stable, the Timing and Voltage Swing parameters are optimized, and the system will run reliably 24/7 without crashing.
- If the ‘eye’ is narrow or noisy (Closed eye): This means the PCB traces are too long, there’s crosstalk, or the Drive Strength configuration is incorrect. This is the cause of random kernel panics that are very difficult to debug later.
The image above shows an Eye Diagram of a DDR signal, created by overlaying multiple signal cycles in the time domain. The horizontal axis represents time (ns), while the vertical axis represents voltage (V), allowing for simultaneous evaluation of the signal’s timing and amplitude characteristics. The “eye opening” represents the time and voltage range at which data can be safely sampled. A clear, wide eye indicates a signal with low noise, small jitter, and good timing margin, thus ensuring stable high-speed data transmission on the DDR bus. The pink and green signal lines represent multiple samples of different data bits, overlaid to reflect the actual signal variation. The reference voltage level lines (e.g., ~0.6V and ~0.9V) serve as logical thresholds, helping to determine the ability to distinguish between ‘0’ and ‘1’ levels. This diagram is a crucial tool in the bring-up and optimization process, especially when evaluating DDR signal quality. A properly calibrated Eye Diagram is essential to ensuring accurate and reliable DDR initialization and data transfer.
After the computer recognizes the chip, the next step in bring-up is to get the RAM working. If the RAM is faulty, you will never be able to boot Linux.
- Tool: Use the DDR Tool tab in STM32CubeMX.

- Procedure:
- Load the RAM configuration file (DDR3/LPDDR3) into the tool.
- Click Test: The tool will load a small piece of code into the SRAM to test the read/write capabilities of the external RAM
- DDR Training: The system automatically calculates the latency (skew) of the signal lines on the PCB to provide the optimal set of parameters.
The Flashing Chain
Because the SRAM of the STM32MP1 is very small, we cannot directly load an image file of several hundred MB into Flash memory. The process must be done in a “bridging” fashion: loading the small programmer to power the larger programmer, and then loading the actual data.
Step 1: Prepare the FlashLayout file (.tsv)
The configuration file (usually a Flash Layout) acts as the orchestrator of the entire system loading process, establishing key operating parameters for the loading tool. First, this file configures the physical layer by defining the connection protocol (USB/UART) to establish data flow between the Host PC and the Target. The core of the file is the Partition Map, which details the ID identifiers, system partition names (such as ssbl, boot, rootfs), and addresses mapped from executable files (.stm32, .bin) on the computer to the target memory.
A standard partition structure is established in a hierarchical logical sequence to ensure a reliable boot sequence: starting with the FSBL (TF-A) responsible for low-level hardware initialization, followed by the SSBL (U-Boot) acting as the second-stage loader, then the Boot Partition (containing the Kernel and DTB), and finally the RootFS (User Space) storage space. By tightly managing these indices, the flashing process ensures data integrity and consistency in the partition structure on the storage device.

Step 2: Load the primer phase (DFU phase)
In the architecture of application-oriented microprocessors like the STM32MP1, when the system has no software in its internal memory (eMMC/SD), it enters a state of “empty” control resources. The Primer phase is a crucial intermediate step to establish the execution environment in RAM before data can be written to Flash.
1. Loading TF-A (FSBL) into Internal SRAM: Initializing the Physical Layer
When the “Download” command is triggered from the STM32CubeProgrammer, the first process is to push the TF-A (Trusted Firmware-A) file, the version supporting the USB protocol, into the Internal SRAM.
- Principle: Because at boot time, the external DDR RAM is not yet configured and cannot be used, the system is forced to utilize the SRAM integrated within the chip (limited capacity but available power and instantaneous access).
- Key Task: As soon as it’s loaded into SRAM, the TF-A performs the most important task: configuring the DDR RAM controller and setting the voltage and clock speed parameters for the external RAM module. This is the “opening step” to provide the system with sufficient temporary storage space for the subsequent steps.
2. Loading U-Boot (SSBL) into DDR RAM: Setting up the storage controller
After the TF-A confirms the DDR RAM is ready for operation, the computer pushes the U-Boot file (usually in .stm32 format) into this DDR RAM memory.
- Principle: Unlike the TF-A, which focuses only on low-level hardware initialization, the U-Boot is a second-stage loader (SSBL) with a rich driver system.
- Key task: When U-Boot is executed from DDR RAM, it activates more complex communication protocols such as SDMMC (for SD/eMMC cards) or QSPI. At this point, the processor officially becomes capable of “understanding” and “communicating” with Flash memory.
3. The Role of the DFU “Bridge”
At the end of the Primer Phase, the device has transformed from a rudimentary hardware block into a system with full resource control:
- SRAM acts as the bootstrap.
- DDR RAM acts as a large-capacity data buffer.
- U-Boot acts as the “manager,” executing write commands directly from the USB data stream to the blocks on the eMMC/SD Card.
When you click “Download” on STM32CubeProgrammer:
- Load TF-A into SRAM: The USB version of the TF-A file is pushed into SRAM. The chip starts running TF-A to initialize DDR RAM.
- Load U-Boot into DDR: After the RAM is loaded, the computer pushes the U-Boot file (usually u-boot-stm32mp1…stm32) into DDR. At this point, U-Boot takes control and activates the eMMC/SD Card drivers.
Step 3: Write the data to storage memory.
U-Boot will open a “channel” to receive data from the computer via USB and write it directly to the partitions on the eMMC/SD Card according to the scheme in the .tsv file.

Flashing Process Diagram (Sequence Diagram)
The process shown in the image illustrates the flashing steps for an embedded system (typically the STM32MP1 series) via DFU (Device Firmware Upgrade) mode. This process is performed using a step-by-step “priming” mechanism to gradually initialize complex hardware components.
First, when the device is in DFU mode, the host PC sends the first-stage bootloader (TF-A/FSBL) to the internal SRAM. After the TF-A executes and activates the external DDR RAM, the PC continues to send the second-stage bootloader (U-Boot/SSBL) to it. At this point, U-Boot acts as a controller to initiate communication with storage devices such as eMMC or SD cards. Finally, larger operating system files (such as Kernel or RootFS) are loaded into DDR RAM and then officially written to Flash memory. The process ends when all data has been successfully loaded into permanent storage, allowing the device to boot up independently afterward.
Check verify issue
The golden rule: “The earlier the error appears, the closer the problem is to the hardware.” We will divide this into four main bottlenecks corresponding to the four boot stages.
- Issue 1: No Boot Log
- Issue 2: Freezes at FSBL (DDR/RAM Error)
- Issue 3: Hanging at SSBL (Storage/MMC Error)
- Issue 4: Kernel Panic (Linux Kernel & RootFS Error)
The systematization of the Bring-up process is not just a set of individual tests, but a logical process that strictly adheres to the Hardware Boot Chain. The core principle is hierarchical diagnostics: validating minimum operating conditions at the low level before expanding to more complex peripherals.
Each issue analyzed, from the silence of the No Boot Log to the crash of the Kernel Panic, marks a specific stop point (hang) in the boot process. To visualize the diagnostic priority order and how these issues are distributed across architectural layers, the diagnostic diagram below provides an overview of the boot event sequence.

The diagram above clearly establishes a hierarchical diagnostic structure, progressing from the lowest Hardware & Low-level layer to the highest OS & Kernel Space layer. Red X marks indicate failed execution stages triggered by specific potential causes, while green checkmarks confirm successful execution steps.
This diagram serves as a quick diagnostic tool to pinpoint the problematic architecture layer. However, to transition from identifying errors (e.g., Treo at the Storage layer) to implementing specific technical solutions, a detailed action checklist is needed. The summary table below systematizes this visual diagram into a technical checklist, defining the components to be checked, the necessary tools, and specific implementation solutions for each error layer.
| Priority | Issue | Symptom Description | Component to Check | Checking Tool/Method | Specific Solution |
| 1 | Issue 1: Basic Physical Layer Failure (No Boot Log) | Board is completely silent. UART does not log. PC does not recognize USB DFU. |
1. Power Tree.
2. Clocking (24MHz HSE).
3. BOOT Configuration Pins (BOOT[2:0]). |
– Multimeter.
– Oscilloscope.
– Observe Switches/Resistors. |
– Measure VDDCORE, VDD_DDR, VDD.
– Measure crystal waveform.
– Set BOOT Switches to 000 (DFU). |
| 2 | Issue 2: RAM Initialization Failure (Hang at FSBL/TF-A) | Initial logs appear (BootROM/FSBL), then hangs completely. Last log is typically ‘DDR Initialization’. |
1. DDR PHY VDD (Power for RAM physical layer).
2. DDR Clock Tree (PLL_DDR).
3. DDR Parameters (Layout TSV / Device Tree |
– Multimeter (measure voltage drop).
– Refer to documentation (Datasheet, Schematic).
– TSV configuration file. |
– Check for stable VDD_DDR power.
– Check PLL_DDR in DTB.
– Compare DDR Timing Params with the RAM chip datasheet. |
| 3 | Issue 3: Storage Memory Communication Failure (Hang at SSBL/U-Boot) | U-Boot runs successfully, recognizes RAM, but hangs when attempting to access SD/eMMC. Last log is ‘Mounting’ or ‘Failed to read partition’. |
1. SDMMC VDD/VCC (Power for SD card/eMMC).
2. SDMMC Clock Tree.
3. Driver Configuration (U-Boot DTB / Flash Layout |
– Multimeter.
– Check TSV file for partition location.
– Check U-Boot Device Tree |
– Ensure SDMMC power is operational.
– Check
– Validate partition ID/address in the |
| 4 | Issue 4: Linux Kernel & RootFS Failure (Kernel Panic) | U-Boot successfully loads Kernel/DTB. Kernel begins to run, then crashes (Panic). Console displays ‘Kernel Panic’, ‘VFS: Unable to mount root fs’. |
1. Kernel Image Integrity (Data Corruption).
2. Kernel DTB Configuration (Incorrect DTB parameters).
3. RootFS Integrity (RootFS partition corruption). |
– Checksum (MD5/SHA).
– Look up
– Check EXT4 format of the RootFS. |
– Verify checksum of Kernel/RootFS files.
– Modify
– Attempt to recreate the RootFS partition ( |
STM32MP1 Bootchain: From BootROM to Linux
In embedded systems running Linux, the boot process is not simply a matter of turning on the power and running the operating system. Especially with the STM32MP1 processor family, the bootchain is designed as a multi-stage boot to ensure flexibility, security, and the ability to configure complex hardware.
From the moment power is applied, the system goes through a series of consecutive steps: from the internal BootROM, to a low-level bootloader like Trusted Firmware-A, followed by a high-level bootloader like U-Boot, and finally to the Linux kernel and user space. Each stage has its own role and is closely interdependent.
Understanding the bootchain is not only crucial for grasping how the system boots, but it’s also key to:
- Debug boot errors (TF-A freezes, DDR errors, kernel panics, etc.)
- Customize the system as needed (change boot mode, optimize boot time)
- Develop and integrate firmware efficiently
In this article, we will delve into each stage of the bootchain on the STM32MP1, accompanied by illustrative diagrams and detailed analysis so you can understand the entire process from power-up to Linux being ready to operate.
Why does the STM32MP1 need multi-stage boot?
Unlike simpler microcontrollers, the STM32MP1 is a complex multi-processor unit (MPU). It requires multi-stage boot for two core reasons. First, there’s the limitation of internal memory; upon power-on, the system only has a small amount of internal SRAM (less than 256KB). This is insufficient to accommodate the entire Linux Kernel (which weighs tens of MB); therefore, intermediate steps are needed to “wake up” external RAM (DDR). Second, flexibility and security are important. Breaking down the stages allows developers to customize hardware configurations (such as RAM type and storage devices) and establish security layers (Secure Boot) before the operating system officially takes control.
Overview of the main stages:
- BootROM: Internal source code responsible for finding the first boot device.
- FSBL (First Stage Bootloader): Usually TF-A, its main task is to configure DDR RAM and set up a secure environment.
- SSBL (Second Stage Bootloader): Usually U-Boot, provides file management features and prepares the kernel loading environment.
- Linux Kernel: The heart of the operating system, managing resources and controlling peripheral devices.
- RootFS: The file system containing applications, libraries, and the end-user interface.
ROM Code

The ROM code in the STM32MP1 is immutable, meaning it cannot be changed, deleted, or overwritten by the user, thus acting as the most reliable software in the entire system. Immediately after the reset signal is released, the ROM code begins execution almost instantaneously, ensuring a fast and stable boot process. Importantly, it operates in a very limited environment – when the system has no external RAM, no high clock speed, and relies solely on the chip’s internal oscillator and SRAM to perform initial tasks.

ROM Code schematic diagram
The ROM code checks the physical state (high/low) of the dedicated pins on the chip. Based on this, it knows where to find the FSBL file (SD Card, eMMC, NAND/NOR Flash, USB (DFU Mode), etc.). To read data from an SD Card or USB, the ROM code contains extremely rudimentary drivers for the SDMMC, SPI, or USB controllers. It doesn’t need an operating system; it communicates directly with the hardware at the lowest level.
The ROM code searches for a special data structure called the STM32 Header in the first bytes of external memory. This header contains information about the file size and destination address in SRAM.
If Secure Boot mode is enabled, the ROM code uses encryption algorithms (such as ECDSA) to verify the digital signature of the FSBL. If the file has been illegally modified, the chip will refuse to boot to protect the system.
After everything is valid, the ROM Code copies the entire FSBL (TF-A) file to the Internal SRAM. Then, it executes a jump instruction (Branch) to the starting address of the FSBL. From this moment on, the ROM Code “retreats” and does not participate in the operation until the next Reset.
First Stage Boot Loader (FSBL)
TF-A is an industry standard from ARM. It was chosen as the FSBL for the STM32MP1 because of its robust security handling capabilities and the ability to perform hardware-intensive initialization that conventional bootloaders struggle to achieve within the limited memory space of SRAM.

FSBL Schematic diagram
Task: (DDR Initialization) – This is the most important task. The DDR controller is an extremely complex hardware unit. The FSBL must:
- Set precise timing parameters down to the nanosecond for the type of RAM being used (DDR3, LPDDR3, etc.).
- Perform DDR Training: Send test data strings to align the signal between the chip and RAM, ensuring that data is not corrupted during high-speed transmission.
- Result: After this step, the system, which initially had only ~256KB of SRAM, can now access 512MB – 1GB of DDR RAM.
ROM Code runs at very low speeds. The FSBL will configure the power multipliers (PLLs) to push the CPU clock speed higher, allowing for faster operating system loading. It also sets up the voltage regulators (PMICs) to provide stable power to other components.
The STM32MP1 uses TrustZone technology. The FSBL is responsible for clearly defining:
- Secure World: Where sensitive tasks (encryption, digital signatures) run.
- Normal World: Where Linux and user applications run. It establishes separations to prevent Linux from illegally interfering with secure memory areas.
The FSBL runs entirely within internal SRAM. Because this space is very small, TF-A is designed in smaller “stages” like BL2. It has no user interface; you only see short logs via the serial port indicating whether the DDR initialization process was successful or failed.
Second Stage Boot Loader (SSBL)

U-Boot’s core functions:
- File System Management: This is U-Boot’s strongest point. It has drivers to understand partition formats like FAT32 or EXT4. This allows it to find the correct uImage or zImage file located deep within the memory card’s directories and load it into RAM.
- Device Tree (DTB) Usage: U-Boot not only loads the Kernel but also the Device Tree file. This is a map describing the hardware (which pins are for LEDs, which are for I2C, etc.). U-Boot can edit this map “on the fly” (while running) before delivering it to Linux, making the system more flexible.;
- Interactive Environment (U-Boot Shell): This is the only stage in the Bootchain where humans can intervene.
U-Boot on the STM32MP1 is usually stored in a separate partition (often named ssbl). If you want to change the boot logo or change the boot method (for example, booting over the network instead of the memory card), you just need to edit and re-flash this partition without touching the TF-A or Kernel.
Linux Kernel

Linux Kernel schematic diagram
- The Linux kernel goes through several crucial steps to complete the system boot process:
First, the kernel initializes the MMU (Memory Management Unit) to convert from physical addresses to virtual addresses, isolating and protecting memory between applications, ensuring system stability even if a process fails. - Next, the kernel reads the Device Tree to load drivers and configure peripherals such as the display, network (TCP/IP for Ethernet/Wi-Fi), and industrial protocols (CAN, RS485).
- Then, the kernel attempts to mount the Root File System (rootfs) from storage devices such as SD cards or eMMCs; failure results in a “Kernel Panic” error. Upon successful mounting, the system executes the first process, /sbin/init (Systemd or BusyBox), marking the transition to User Space, where user applications begin running. Simultaneously, the kernel can also use the RemoteProc framework to load firmware and activate the Cortex-M4 core to run in parallel, handling real-time tasks while Linux processes higher-level tasks.
Linux Init process

Linux Init process schematic diagram
When the kernel successfully mounts the RootFS partition (root file system), it finds and runs the first executable file. By default, this file is located at /sbin/init. The Init Process is the “parent” of all other processes in the Linux system. Its process identifier (PID) is always 1.
At this layer of the bootchain on the STM32MP1, running in user space, it executes network configuration scripts and starts applications (web server, Qt interface, or simply login commands).
STM32MP135 DDR Configuration Using STM32DDRFW-UTIL
1. Introduction to STM32DDRFW-UTIL
2. Overview and Download of STM32DDRFW-UTIL
3. DDR Configuration Using STM32CubeMX
4. Configuring DDR Test Firmware
5. Flashing DDR Test Firmware Using STM32CubeProgrammer
6. Running DDR Memory Tests on STM32MP135
7. Conclusion
1. Introduction to STM32DDRFW-UTIL
STM32DDRFW-UTIL is a specialized firmware utility provided by STMicroelectronics designed for the initial bring-up and validation of DDR memory on STM32MP1 series microprocessors (including STM32MP135 and STM32MP157).
This utility acts as a bridge between the hardware and the STM32CubeProgrammer interface, allowing developers to:
- DDR Initialization: Configure the DDR controller and PHY timing parameters before the bootloader (FSBL/SSBL) is even loaded.
- Configuration Validation: Verify that the DDR settings generated in STM32CubeMX match the physical RAM chip specifications.
- Comprehensive Testing: Execute built-in diagnostic routines to ensure stability: Basic Tests, quick connectivity and data bus integrity checks. Intensive Tests, deep memory cell verification to catch intermittent bit flips. Stress Tests, high-load cycles to validate Signal Integrity (SI) and thermal stability.
- Hardware Debugging: Identify PCB routing issues, impedance mismatches, or power supply V_REF / V_TT fluctuations during the board bring-up phase.
2. Overview and download of STM32DDRFW-UTIL Tool
Link download: https://www.st.com/en/development-tools/stm32ddrfw-util.html

STM32DDRFW-UTIL is a toolkit provided by STMicroelectronics to support testing and evaluating the performance of DDR memory on STM32MP1 series microprocessors. This tool includes firmware tests, BSP libraries, HAL drivers, and sample projects for the STM32CubeIDE development environment. After downloading, users can build the firmware to create a .stm32 file, then flash it directly onto the board to perform RAM tests.

STM32DDRFW-UTIL offers various test types such as Basic test, Intensive test, and Stress test, allowing for testing of issues related to DDR configuration, timing, data bus, and memory stability. This enables engineers to detect hardware or configuration errors early during the bring-up process before the system boots up the bootloader or Linux operating system.
This toolkit can be downloaded directly from the STMicroelectronics development page. After downloading, users should extract the files and review the folder structure to understand the components, such as the firmware tests, sample projects, and DDR configuration files. This makes customizing the board easier and more accurate.
3. DDR Configuration Using STM32CubeMX
To configure DDR for the system, first open STM32CubeMX and select ACCESS TO MCU SELECTOR to search for the STM32MP135 processor. After selecting the correct MPU, create a new project. In the configuration interface, go to the System Core tab → DDR to set the memory parameters. Here, you need to select the correct type of RAM being used, for example, DDR3L Zentel A3T4GF40BBF. Next, switch to the Clock Configuration tab to configure the clock speed for the DDR controller. Set the value PLL2R = 533 MHz, which is the operating frequency of DDR. After completing the configuration steps, switch to the Project Manager tab and select GENERATE CODE to generate the source code. When the generation process is complete, access the folder …/<project_name>/DeviceTree/<project_name>/tf-a/ and find the stm32mp13-mx.dtsi file. This is a crucial file containing the complete configuration of DDR registers, used in the initialization and control of DDR memory on the system.

setting frequency in clock configuration
4. Configuring DDR Test Firmware
To configure the server firmware for DDR testing, first open STM32CubeIDE and import the project template STM32MP135C-DK. This is the project template provided for running the DDR Test Tool on the STM32MP135 processor line. After importing the project, it is necessary to check and adjust the PMIC configuration to match the board’s hardware. On the ST reference board, the system uses STPMIC with an I2C address of 0x33. However, for board customizations, the PMIC may be different, for example, using MP5470 with an I2C address of 0x11. Therefore, it is necessary to edit the corresponding parameters in the project’s configuration file, specifically stm32mp13xx_disco_bus.h and stm32mp13xx_util_conf.h, to ensure the firmware can correctly communicate with the PMIC and provide the correct level of power for DDR during testing.

Refer to line 57 in stm32mp13xx_disco_bus.h

Change param in stm32mp_utils_conf.h
The following functions in main.c need to be modified: The SystemPower_Config() function is responsible for initializing the I2C interface to communicate with the PMIC and provide power to the RAM. While the STM32MP135C-DK performs strict checks, such as verifying the PMIC version, these steps are unnecessary for the Onekiwi board since it does not use an official ST PMIC. Therefore, these validation steps should be bypassed.


Comment out code in BSP_PMIC_Init()
The RAM configuration is handled in stm32mp13xx-ddr3-4Gb-template.h. Copy the contents of the stm32mp13-mx.dtsi file (generated from the previous RAM configuration step) and paste them into this header file. Once the configuration is accurately adjusted for the board, build the project and proceed to flash it onto the board.
5. Flashing DDR Test Firmware Using STM32CubeProgrammer

After the project build is complete in STM32CubeIDE, the system will create a firmware file in .stm32 format. This file is used to load the DDR Test Tool program onto the STM32MP135 microcontroller for RAM testing. Before flashing the firmware, connect the board to the computer via the USB Type-C port. Simultaneously, open a terminal, such as MobaXterm, to monitor logs from the board’s UART port during the tool’s execution. Next, use STM32CubeProgrammer to load the firmware. Open the CMD window and run the command: STM32_Programmer_CLI.exe -c port=USB1 -w “path_to_file.stm32” 0x01 -s 0x01
Ex: STM32_Programmer_CLI.exe -c port=USB1 -w “C:\Users\Admin\workspace\STM32DDRFW-UTIL\DDR_Tool\STM32MP135C-DK\STM32MP135C-DK_DDR_UTILITIES_A7\DK\STM32MP135C-DK_DDR_UTILITIES_A7.stm32” 0x01 -s 0x01

In this command, the “-w” parameter specifies the path to the newly built .stm32 file. After the command is successfully executed, the DDR test firmware will be loaded into the MPU, and the system will start running the DDR Utility Tool, ready to perform RAM testing commands.
6. Running DDR Memory Tests on STM32MP135
After the firmware is loaded and starts running on the STM32MP135, the MobaXterm terminal window will display a command prompt as shown in the image:

This prompt indicates that the DDR Utility Tool has successfully started and is ready to receive commands from the user. Here, you can enter commands to test and evaluate the performance of your DDR memory. For example, enter the command:
help
info
freq
...
To display a list of supported commands in the tool. Through these commands, users can view DDR configuration information, change parameters, and run RAM tests to confirm stable memory operation on the system. You can test using the following commands:
- Command info:
| Command-line | Expected result | Verdict |
|---|---|---|
| info | step = 0 : DDR_RESET | PASS |
| name = DDR3-1066/888 bin G 1x4Gb 533MHz v1.45 | ||
| size = 0x20000000 | ||
| speed = 533000 kHz | ||
| cal = 0 |
- Command freq:
| Command-line | Expected result | Verdict |
|---|---|---|
| freq | DDRPHY = 528000 kHz | PASS |
- Command param:
| Command-line | Expected result | Verdict |
|---|---|---|
| param | ==ctl.static== | PASS |
| mstr = 0x00040401 | ||
| mrctrl0 = 0x00000010 | ||
| mrctrl1 = 0x00000000 | ||
| derateen = 0x00000000 | ||
| derateint = 0x00800000 | ||
| pwrctl = 0x00000000 | ||
| pwrtmg = 0x00400010 | ||
| hwlpctl = 0x00000000 | ||
| rfshctl0 = 0x00210000 |
- Change parameter DDR:
| Command-line | Expected result | Verdict |
|---|---|---|
| param mstr | mstr = 0x00040401 | PASS |
| param mstr 0x00040402 | mstr = 0x00040402 | PASS |
| param mstr | mstr = 0x00040402 | PASS |
| param mstr 0x00040401 | mstr = 0x00040401 | PASS |
- State transition DDR:
| Command-line | Expected result | Verdict |
|---|---|---|
| next | 1 : DDR_CTRL_INIT_DONE | PASS |
| step 3 | step to 3 : DDR_READY | PASS |
| 1 : DDR_CTRL_INIT_DONE | ||
| 2 : DDR_PHY_INIT_DONE | ||
| 3 : DDR_READY |
- Command test RAM
| Command-line | Expected result | Verdict |
|---|---|---|
| test help | displays test commands | PASS |
| 0 : Test All | ||
| 1 : Test Simple DataBus |
- Test Data Bus
| Command-line | Expected result | Verdict |
|---|---|---|
| test 1 0xc0000000 | Result: Pass [Test Simple DataBus] | PASS |
- Test RAM
| Command-line | Expected result |
|---|---|
| test 0 | result 1: Test Simple DataBus = Passed |
| result 2: Test DataBusWalking0 = Passed | |
| result 3: Test DataBusWalking1 = Passed | |
| result 4: Test AddressBus = Passed | |
| result 5: Test MemDevice = Passed | |
| result 6: Test SimultaneousSwitchingOutput = Passed | |
| result 7: Test Noise = Passed | |
| result 8: Test NoiseBurst = Passed | |
| result 9: Test Random = Passed | |
| result 10: Test FrequencySelectivePattern = Passed | |
| result 11: Test BlockSequential = Passed | |
| result 12: Test Checkerboard = Passed | |
| result 13: Test BitSpread = Passed | |
| result 14: Test BitFlip = Passed | |
| result 15: Test WalkingZeroes = Passed | |
| result 16: Test WalkingOnes = Passed | |
| Result: Pass [Test All] |
7. Conclusion
The STM32DDRFW-UTIL toolkit provides an efficient and practical method for validating DDR memory during the early hardware bring-up stage of STM32MP135 and other STM32MP1 microprocessors from STMicroelectronics. By allowing developers to initialize DDR, verify configuration parameters generated by STM32CubeMX, and execute multiple memory diagnostic tests, the tool helps ensure that the DDR subsystem operates reliably before the system boots the bootloader or Linux operating system.
Through built-in commands and comprehensive testing routines—ranging from simple data bus verification to full stress testing—engineers can quickly identify potential issues such as incorrect DDR timing parameters, signal integrity problems, PCB routing errors, or unstable power supply conditions. This greatly simplifies the debugging process and reduces development time during the hardware validation phase.
In summary, integrating STM32DDRFW-UTIL into the development workflow enables engineers to confidently validate DDR hardware and configuration, ensuring a stable foundation for the entire embedded software stack.
Mastering Flash Memory: Architecture and Protocols
1. Flash Memory Overview
2. Flash Memory Protocol: SPI, QSPI, OSPI
2.1. Protocols in NOR Flash Memory
2.2. Protocols in NAND Flash Memory
2.3. NAND vs. NOR Interface Comparison
3. Conclusion
1. Flash Memory Overview
Flash memory is a type of non-volatile memory. This means that data is retained even when the device is powered off. Unlike many traditional memory types, Flash memory allows data to be erased and rewritten using electrical signals instead of requiring hardware replacement. Therefore, it has become a popular storage solution in modern electronic devices.
Today, Flash memory is widely used in many devices such as:
- USB flash drive
- SD/microSD memory card
- Smartphones
- Digital cameras
- Tablets and IoT devices
- SSD (Solid State Drive)
A major advantage of Flash Memory is that it has no moving mechanical parts, making it more durable, energy-efficient, and more resistant to vibration than traditional hard drives (HDDs). Flash Memory is currently divided into two main architectures: NOR Flash and NAND Flash; these two types differ in circuit structure, access speed, capacity, and intended use.
NOR FLASH MEMORY

In the image, you can see that each transistor (memory cell) has its drain terminal directly connected to the bit line (in the image, the vertical line connected to V_D) and its source terminal connected to ground (Gnd). This is characteristic of NOR architecture, allowing random access to each memory cell independently, creating a parallel structure.
Operating mechanism:
- When reading data from cell A, the system activates the central horizontal line (V_CG) and the central vertical line (V_D).
- If cell A contains electrons (currently storing 0), it will block current flow. If there are no electrons (currently storing 1), current will flow from $V_D$ to ground (Gnd).
- The sensor will measure this current to determine whether the cell is 0 or 1.
Due to its parallel design, data travels directly from the memory cell to the bit line. It has high reliability and is suitable for storing system source code (such as computer BIOS or phone firmware) because it is less prone to bit errors than NAND.
NAND FLASH MEMORY

Due to the serial design where transistors are connected to form a chain, data must “queue” through other cells to reach the bit path, making random access slower than NOR. However, thanks to this structure, NAND has enormous capacity and is inexpensive, making it extremely suitable for SSDs, USB drives, or memory cards – where we need to store huge amounts of data such as images, videos, and files.
Operating Mechanism:
- When reading data at cell B, the system activates the central horizontal line V_CG or WL1) with a specific reading voltage.
- Key Difference: Unselected cells in the same column (such as cells A and C) are forced to “open” (apply a high voltage V_pass) so that current can flow through them, regardless of what they contain.
- If cell B contains an electron (currently holding a 0), it will block the current of the entire chain. If cell B is empty (currently holding a 1), the current will flow freely from V_D, through A, through B, through C, and then to ground.
- The sensor will measure the current at the end of the chain to determine whether cell B’s value is 0 or 1.
Comparison between NAND Flash Memory vs NOR Flash Memory
| Feature | NAND Flash | NOR Flash |
|---|---|---|
| Memory Cell Connection | Cells connected in series (NAND structure) | Cells connected in parallel (NOR structure) |
| Access Method | Page-based read/write, block erase | Random access read, byte-level access |
| Read Speed | Moderate | Very fast |
| Write Speed | Fast | Slower than NAND |
| Erase Speed | Fast (block erase) | Slower |
| Storage Capacity | Very high (GB to TB range) | Lower capacity (usually MB to small GB) |
| Cost per Bit | Lower | Higher |
| Code Execution | Not suitable for direct code execution | Supports Execute In Place (XIP) |
| Reliability / Endurance | Lower endurance than NOR | Higher endurance |
| Typical Applications | SSD, USB drives, SD cards, smartphones | Firmware storage, BIOS, embedded systems |
2. Flash Memory Protocol: SPI, QSPI và OSPI
2.1. Protocols in NOR Flash Memory
SPI in NOR Flash Memory

SPI NOR Flash is a popular type of Flash memory in embedded systems due to its simple interface design and high reliability. One of the key advantages of the SPI protocol is its multi-slave capability on the same bus. In this architecture, multiple Flash chips can share data lines and clock speeds, including MOSI, MISO, and SCLK. The microcontroller (Master) only needs to use a separate Chip Select (CS/SS) line for each device, thereby saving the number of GPIO pins on the microcontroller and simplifying the hardware design.
In terms of data transfer performance, SPI operates using a single-bit transfer mechanism, transmitting 1 bit of data per clock cycle. While this speed is sufficient for many basic embedded applications, in systems requiring higher bandwidth, the SPI protocol can be extended to Dual SPI (2 data lines) or Quad SPI (4 data lines) by utilizing additional functional pins of the Flash chip.
The standard SPI protocol is the most basic communication platform between a microcontroller and NOR Flash memory. It uses a synchronous serial data transmission and reception mechanism with a 4-wire main signal structure:
- SCLK (Serial Clock – Red): The synchronization clock generated by the Master. All data bit read/write operations are based on the timing of this clock.
- MOSI (Master Out Slave In – Green): The command and data transmission line from the Master to the Flash chip (e.g., sending read commands, erase commands, or memory addresses).
- MISO (Master In Slave Out – Blue): The data transmission line from the Flash chip back to the Master (e.g., the content of the data being read).
- SS/CS (Slave Select / Chip Select – Yellow): Chip selection signals (SS1, SS2, SS3 separate for each Slave). When communicating with a specific NOR Flash chip, the Master pulls the corresponding SS pin low (Logic 0). The remaining chips will be in a “sleep” state and ignore all signals on the common bus.
QSPI in NOR Flash Memory

Quad-SPI (QSPI) extends the SPI protocol by utilizing four bilateral data lines labeled IO0, IO1, IO2, and IO3. This allows QSPI to transmit 4 bits of data per clock cycle, increasing bandwidth by four times compared to standard SPI, which transmits only 1 bit per cycle. In normal SPI mode, some pins on the Flash chip have special functions such as Q2/nWP (Write Protect) and Q3/nHOLD (Hold). However, in Quad-SPI mode, these pins are reused as data lines, becoming IO2 and IO3.
In addition to data lines, QSPI still uses familiar control signals:
- CLK: data transmission synchronization clock
- nCS (Chip Select): signal to select the chip to operate at a low level
OSPI in NOR Flash Memory
The diagram shows how the Octal SPI Manager coordinates extremely complex signals to achieve maximum speed:
- 8 data lines (Data [7:0]): Divided into two groups (Data[3:0] and Data[4:7]). This allows simultaneous transmission of 8 bits (1 byte) in each clock cycle, doubling the bandwidth compared to QSPI.
- DQS (Data Strobe) pin: This is a “vital” component in the diagram. At extremely high speeds (above 100MHz), the clock signal is prone to phase shift. The DQS pin acts as a feedback signal from the Flash chip to the MCU to ensure that data is accurately sampled at the most stable point.
- Multi-port system (Port 1 & Port 2): The structure shown in the figure allows the MCU to communicate in parallel with two OSPI devices simultaneously, or flexibly manage between an OSPI NOR Flash chip and an OSPI RAM chip.
In terms of operation, OSPI completely changes the way data is transmitted through advanced modes such as SDR (1 byte/cycle transmission) and especially DDR/DTR (Double Data Rate) – allowing data transmission on both the rising and falling edges of the clock, pushing speeds up to 2 bytes/cycle. Despite its power, the OCTOSPIM controller maintains flexibility with backward compatibility, easily reconfigurable data pins to function as a conventional SPI or QSPI chip.
2.2. Protocols in NAND Flash Memory
SPI in NAND Flash Memory

There are a few key technical details that differentiate NAND and NOR, although both use the same SPI/QSPI interface:
- ECC (Error Correction Code) Block: This is the most obvious distinguishing feature. Because NAND Flash frequently experiences bit flips during use, it requires an integrated ECC error control unit. NOR Flash is much more stable and usually doesn’t need this block.
- Cache Memory: NAND Flash doesn’t allow direct byte-by-byte reading from the main memory array. Data must be loaded from the NAND array into a cache/page buffer before being pushed out via the SPI interface. Your diagram clearly illustrates this flow.
- Quad-SPI Interface: The signal pins at the top ($SIO0$ to $SIO3$) indicate that the chip supports Quad mode, which increases the data transfer speed for NAND (which has slower read speeds than NOR).
- Internal MCU: Some modern SPI NAND chips incorporate a small controller (MCU) inside to manage complex tasks such as bad block management and automatic ECC algorithms.
QSPI in NAND Flash Memory
The core difference in the operation of QSPI NAND compared to NOR is the two-stage data reading process.
- Stage 1: Page Read (From NAND Array to Cache): The MCU sends a page read command (Page Read) along with the address via the SPI protocol. Then, the NAND chip automatically reads data from the main storage array (NAND Array) and loads it into the internal Cache Register. This process takes a waiting time (busy time).
- Stage 2: Read From Cache (From Cache to MCU via QSPI): After the data is in the Cache, the MCU sends a read command from the cache. At this point, data is simultaneously pushed out on all four IO lines (IO0, IO1, IO2, IO3). With 4 bits transmitted per clock cycle, the speed is four times faster than traditional SPI.
OSPI in NAND Flash Memory
While traditional NAND Flash uses a complex parallel interface with many pins, modern embedded systems favor Serial NAND Flash (often called SPI NAND) due to its low pin count and simplified PCB design.
SPI & QSPI in NAND Flash
- SPI Mode: The basic communication mode using two data lines (SI and SO). It is reliable but limited in speed, suitable for low-bandwidth data logging.
- QSPI Mode (Quad-SPI): This is the most popular performance tier for SPI NAND. It repurposes the WP# and Hold# pins as additional data lines (IO2 and IO3). By transmitting 4 bits per clock cycle, QSPI significantly reduces the time needed to read large data blocks from the NAND cache to the MCU.
Why not OSPI in NAND?
Unlike NOR Flash, OSPI (Octal SPI) is extremely rare in NAND Flash. This is because the internal “Page Read” time of NAND is a physical bottleneck. Increasing the interface to 8 bits (OSPI) provides diminishing returns since the system still has to wait for the NAND array to move data into the cache.
2.3. NAND vs. NOR Interface Comparison
| Feature | NOR Flash | NAND Flash |
|---|---|---|
| Common Interface | SPI, QSPI, OSPI | SPI NAND, QSPI NAND |
| Main Purpose | Firmware storage | Mass data storage |
| Execute In Place (XIP) | Supported (especially with QSPI/OSPI) | Not supported |
| Data Access Type | Random access | Page-based access |
| Read Flow | Direct read from memory array | Read page to cache → read from cache |
| SPI Support | Very common | Common in SPI NAND |
| QSPI Support | Widely used for high-speed read | Used to speed up cache read |
| OSPI Support | Increasingly used in high-performance systems | Rarely used |
| Data Bus Width | 1-bit (SPI), 4-bit (QSPI), 8-bit (OSPI) | Usually 1-bit or 4-bit |
| Typical Clock Speed | Up to 133 MHz or higher | Usually lower than NOR |
| Implementation Complexity | Simple driver | More complex (ECC, bad block management) |
| Typical Applications | Bootloader, firmware, MCU code storage | SSD, eMMC, USB storage |
The choice depends on system priorities: Choose NOR Flash if you need fast random read speeds and absolute firmware security; choose NAND Flash if the top priority is large capacity and budget optimization. In practice, engineers often combine both: using a small NOR chip for booting and a large NAND chip to store all user data, creating a system that is both responsive and powerful.
3. Conclusion
Furthermore, the evolution of communication interfaces from standard SPI to high-performance QSPI and OSPI has significantly boosted data throughput, enabling modern embedded systems to handle complex tasks with smaller footprints. Understanding these architectural differences and protocol nuances is essential for any developer looking to balance performance, reliability, and cost in today’s evolving electronic landscape.
Bluetooth LE Audio: Architecture, Auracast, and Unicast Roles
1. What is Bluetooth LE Audio? (Architecture Overview)
2. Unicast Audio in LE Audio: Server and Client Roles
3. Broadcast Audio and Auracast Technology
4. Why LE Audio Matters for IoT and Embedded Devices
1. What is Bluetooth LE Audio? (Architecture Overview)
Bluetooth LE Audio is not just a new codec; it’s a major overhaul of low-power audio transmission architecture. At the heart of this architecture are Isochronous Channels, enabling low-latency data transmission and tight synchronization.
To connect the actual audio data to these transmission channels, Bluetooth introduces a crucial intermediate layer: ISOAL (Isochronous Adaptation Layer).
The Role of ISOAL in LE Audio Architecture

As shown in the diagram above, ISOAL acts as a data “interpreter,” situated between the application layer and the physical layer:
- Upper Layer (SDUs): This layer generates Service Data Units (SDUs) – these are encoded raw audio samples (such as from the LC3 codec). These SDUs may not perfectly match the size or duration of the radio waves.
- ISO Adaptation Layer (ISOAL): It uses two mechanisms to process the data: Unframed Mode, fragmentation of data to optimize speed. Framed Mode, segmentation and packaging with TimeOffset. This is a key factor enabling LE Audio to perfectly synchronize audio between the left and right headphones.
- Lower Layer (PDUs): After being processed by ISOAL, the data becomes Protocol Data Units (PDUs) ready for the Baseband Resource Manager to transmit over the wireless environment.
2. Unicast Audio in LE Audio: Server and Client Roles
Unicast is a point-to-point (1-to-1) data transmission method. In the diagram, the red dot represents the source (such as a phone) establishing a separate data stream directed to a single receiving device, represented by the blue dot. Although there are other devices around (yellow dots), this data is secure and dedicated solely to the chosen target.
To accomplish this, audio data from the Upper Layer passes through the ISOAL layer to be fragmented or packaged into appropriate PDUs, which are then coordinated by the Baseband Resource Manager for precise transmission to the receiving device’s address.
In the Bluetooth LE Audio architecture, Unicast connections operate based on close coordination between two roles: Unicast Client and Unicast Server. The Unicast Client is typically the source device, such as a phone or computer, actively initiating the connection and managing synchronous data streams (CIS). Here, audio data from the upper layer is fed into the ISOAL layer for encapsulation, converting service data units (SDUs) into protocol packets (PDUs) via either Unframed or Framed modes, depending on latency and synchronization requirements.
Conversely, Unicast servers are typically end devices such as headphones or hearing aids, acting as receivers and transmitters. After receiving PDU packets from the physical layer via the Baseband Resource Manager, the ISOAL layer within the server decodes and reassembly the data to restore the original audio. A key feature of this model is its point-to-point (1-to-1) transmission capability, ensuring privacy and high reliability thanks to a signal feedback mechanism between the client and server, resulting in consistently stable and accurate audio.
Interaction diagram between unicast_server and unicast_client

Based on the sequence diagram you provided, the interaction process takes place in three main phases:
- Discovery Phase: The Unicast Server (acting as an Advertiser) continuously broadcasts advertising packets (ADV_IND). The Unicast Client (acting as a Scanner) scans the frequency band to find a suitable Server.
- Connecting Phase: Upon finding a match, the Client becomes the Initiator and sends a CONNECT_REQ connection request. The Server responds with CONNECT_RSP to confirm the handshake.
- Connected Phase: After a successful connection, the two devices exchange audio data (DATA_TX/RX). At this point, the roles are clearly defined: the Client becomes the Master/Central (coordinator) and the Server becomes the Slave/Peripheral (executor)
| Feature | Unicast Client | Unicast Server |
|---|---|---|
| Core Role | Controlling device & Audio Source | Executing device & Audio Sink |
| Initial State | Scanner / Initiator: Actively scans and initiates connection requests | Advertiser: Broadcasts presence and waits to be discovered |
| Role After Connection | Master / Central: Manages connection parameters and transmission scheduling | Slave / Peripheral: Follows scheduling and responds to data requests |
| ISOAL Layer Processing | Converts audio data (SDUs) into protocol packets (PDUs) for transmission | Receives PDUs and reassembles them into original audio data (SDUs) |
| Communication Model | Targeted data transmission (1-to-1) | Receives data specifically addressed to it |
| Typical Devices | Smartphone, Laptop, Smart TV | TWS earbuds, Bluetooth speaker, Hearing aid |
3. Broadcast Audio and Auracast Technology
Broadcast Audio in Bluetooth LE Audio allows a single device to transmit audio to multiple receiving devices simultaneously without the need for individual connections as in the Unicast model. This technology is commercially available as Auracast™, opening up the possibility of public address systems where a single source can serve dozens or hundreds of surrounding devices at the same time. Instead of using Connected Isochronous Stream (CIS), Broadcast Audio uses Broadcast Isochronous Stream (BIS) organized within a Broadcast Isochronous Group (BIG). Audio is encoded using the LC3 codec and broadcast periodically via Extended Advertising and Periodic Advertising mechanisms so that receiving devices can synchronize.
Auracast Source
An Auracast Source is a device that acts as an audio broadcast transmitter. It encodes the audio signal using LC3, then creates a BIG containing one or more BISs to transmit audio data at fixed intervals. Information about the broadcast stream is published via Extended Advertising and Periodic Advertising, allowing receiving devices to detect and synchronize. In some cases, the broadcast stream can be encrypted to restrict unauthorized access. Typical devices acting as an Auracast Source include public TVs, conference audio systems, airport transmitters, or an audio gateway in an embedded system. Importantly, the Source does not need to manage each individual receiving device, making the system highly scalable.
Auracast Sink
An Auracast Sink is a device that receives and plays back broadcast audio. Instead of establishing a GATT connection like a Unicast Client, the Sink simply scans Extended Advertising to detect the source, then synchronizes with Periodic Advertising and joins BIG to receive BIS data packets. After receiving the ISO data, the device decodes LC3 and outputs the audio to a DAC or speaker. Common devices acting as Sinks include LE Audio-enabled headphones, hearing aids, LE Audio Bluetooth speakers, or embedded audio receivers. Thanks to the broadcast mechanism, multiple Sinks can listen to the same audio source with low latency and high synchronization, suitable for environments such as gyms, movie theaters, conferences, or public areas.
For a device like headphones or a hearing aid to become a true Auracast Sink, it must not simply “listen” but possess a complex protocol layer structure to decode and synchronize audio from the airwaves.
Based on a hierarchical structure, Auracast Sink operates through the following strategic layers:
- Top-level Profiles (PBP): The top layer contains the Public Broadcast Profile (PBP). This is the set of rules that define how a device identifies and receives signals from public Auracast sources.
- Audio Middleware (GAF): The Generic Audio Framework (GAF) layer acts as a general audio management framework, helping the device process concurrent data streams consistently.
- Host Functionality: Here, components such as GAP (Generic Access Profile) manage scanning and detecting surrounding broadcast devices without complex handshake procedures.
- Core (Isochronous Channels & Extended Advertising): Extended Advertising: Allows the Sink to find descriptive information about the audio stream (such as channel name, language) broadcast from the source. Isochronous Channels: These are the physical “pathways” where the actual audio data travels. With Auracast, the Sink will synchronize with the BIS (Broadcast Isochronous Streams) to receive data.
- Audio Data Path (LC3): Finally, the data stream passes through the LC3 decoder. This is the most important link in converting the digital packets received from the Core layer into high-quality, low-latency audio signals for the user’s ears.
After a detailed analysis of the Unicast and Broadcast (Auracast) models in Bluetooth LE Audio, it’s clear that each mechanism is designed for completely different system objectives. Unicast focuses on private, controlled connections optimized for individual devices, while Auracast aims for public, scalable broadcasting and multi-device synchronization. The table below summarizes the key differences between the two models at the architectural, protocol, and practical application levels.
| Criteria | Unicast Audio | Auracast (Broadcast Audio) |
|---|---|---|
| Transmission Model | 1–1 or 1–few | 1–N (scalable to many devices) |
| Connection Required | Yes (LE Connection) | No connection required |
| ISO Transport | CIS (Connected Isochronous Stream) | BIS (Broadcast Isochronous Stream) |
| Stream Grouping | No BIG | Uses BIG (Broadcast Isochronous Group) |
| Stream Control | Client configures Server | Source broadcasts publicly |
| GATT Usage | Yes (PACS, ASCS) | Not required for streaming |
| Privacy Level | High (private link) | Public (optional encryption) |
| Scalability | Limited by connection count | Highly scalable |
| Typical Use Cases | Personal earbuds, voice calls | Airports, conferences, public TVs |
| Receiver Management | Individual device management | No per-device management needed |
4. Why LE Audio Matters for IoT and Embedded Devices

Bluetooth LE Audio not only improves sound quality but also revolutionizes how IoT and embedded systems deploy wireless audio transmission. Thanks to the LC3 codec and Isochronous Channels, LE Audio achieves better compression performance, lower latency, and lower power consumption compared to Bluetooth Classic A2DP. This is especially important for small, battery-powered devices such as true wireless earbuds, hearing aids, or industrial IoT nodes.
In the IoT field, LE Audio opens up many new application models. Multi-listener streaming allows a single device to simultaneously broadcast to multiple users while maintaining synchronization. Public broadcast systems using Auracast can be deployed in airports, shopping malls, or gyms without complex infrastructure. For low-power hearing devices, LC3 significantly reduces power consumption while maintaining high sound quality and extending battery life. In industrial environments, LE Audio can be integrated into industrial voice notification systems, where voice alerts need to be transmitted reliably, with low latency and robust operation in embedded systems.
More importantly, LE Audio is designed with a modern systems mindset: clearly separating the control layer (GATT), the stream configuration layer, and the ISO data transmission layer. This allows for a more flexible firmware architecture, easier scalability, and compatibility with RTOS platforms such as Zephyr or dedicated dual-core audio SoCs.
Conclusion
Bluetooth LE Audio is not simply an upgrade of traditional Bluetooth Audio, but a completely new architecture with two core operating models: Unicast and Auracast Broadcast. Each model serves a different system objective but can coexist within a single device platform.
In particular, Unicast Server and Unicast Client are suitable for personal audio applications such as headsets, voice calls, or hearing aids, where private connectivity and tight control are required. Conversely, Auracast Source and Auracast Sink are geared towards large-scale broadcast models, where a single source can serve multiple receiving devices simultaneously without managing each connection individually.
The flexibility between these two transmission mechanisms, combined with LC3 and Isochronous Channels, makes LE Audio a strategic platform for the next generation of embedded audio systems — from consumer devices and industrial IoT to smart public address systems.
Enable ADB and SSH Forwarding in Yocto Securely
1. Debug Architecture Overview
2. How to Enable ADB in Yocto (Step-by-step Guide)
2.1. Add ADB Package to Yocto Image
2.2. Configure and Enable adbd Service (systemd)
3. How to Enable SSH in Yocto and Configure Secure Port Forwarding
3.1. Add OpenSSH to Yocto Image
3.2. Enable and start SSH service
3.3. Configure SSH Key Authentication (Recommended for Production)
3.4. SSH Port Forwarding for Debugging Internal Services
4. ADB vs SSH in Yocto: Differences, Use Cases and Best Practices
5. Production Hardening Checklist for Yocto Devices (Security Best Pratices)
6. Conclusion: Secure and Efficient Remote Access in Yocto-Based Embedded Systems
1. Debug Architecture Overview

In embedded Linux environments, ADB (Android Debug Bridge) is commonly used during the development phase due to its fast shell access, convenient file pushing/pull functionality, and minimal security configuration requirements. This tool is particularly suitable for rapid debugging, log inspection, or direct system manipulation during firmware development.
For production and remote maintenance environments, SSH (Secure Shell) is a more optimal choice thanks to its secure encryption mechanism, key-based authentication support, and tight access control. SSH allows engineers to manage, update, and maintain devices remotely in a secure and stable manner.

Furthermore, port forwarding allows access to internal services such as Web UI, REST API, MQTT, or gRPC running on the device without opening public ports to the Internet. This not only enhances system security but also facilitates debugging, testing, and monitoring of services during development.

2. How to Enable ADB in Yocto (Step-by-step Guide)
2.1 Add ADB Package to Yocto Image
To use ADB in Yocto, the image needs to contain adbd (ADB daemon) and the client tools. You can add the necessary packages to the image recipe or local.conf: IMAGE_INSTALL:append = “android-tools-adbd android-tools-adb”
If the build process reports a package not found error, check if the Android layer has been added to bblayers.conf, if the BSP supports android-tools, and if the recipe exists in meta-openembedded or the corresponding layer. After configuration is complete, rebuild the firmware using bitbake <your-image-name> and re-flash the board to apply the changes.
2.2 Configure and Enable adbd Service (systemd)
- To have ADB run automatically after booting, you need to configure the systemd service for ADB. Create the file /etc/systemd/system/adbd.service.
- Then enable service, check daemon and port 5555.
- If port 5555 is LISTEN, ADB is ready for TCP/IP connection.
3. How to Enable SSH in Yocto and Configure Secure Port Forwarding
In embedded Linux environments, SSH (Secure Shell) is the standard remote access method for production systems. Compared to ADB, SSH offers strong encryption, key-based authentication, and integration capabilities into DevOps or CI/CD infrastructure.
3.1 Add OpenSSH to Yocto Image
In many Yocto minimal images, the SSH server is not installed by default to keep the system lightweight. To enable SSH, you need to add OpenSSH to the image recipe or local.conf by updating the IMAGE_INSTALL variable: IMAGE_INSTALL:append = “openssh openssh-sftp-server”
Openssh provides an SSH daemon (sshd) that allows remote shell access, while openssh-sftp-server supports secure file transfer via the SFTP protocol. After modifying the configuration, you need to rebuild the firmware using the bitbake command <your-image-name> and re-flash the device for the changes to take effect. Integrating SSH during the build phase ensures the system is ready for remote access during development or deployment.
3.2 Enable and Start SSH Service
- After the device boots into the system, you need to activate and start the SSH daemon.
- To confirm SSH is working, check the service status and network port.
- If port 22 is listening (listening), you can connect from your computer.
3.3 Configure SSH Key Authentication (Recommended for Production)
The principle of establishing a secure connection: The core mechanism of SSH is to create an encrypted “tunnel” (SSH2 session) through an insecure network environment. Data from applications (such as TCP clients) is sent to an internal port on the client machine, then encapsulated, encrypted, and transmitted to the SSH server for decryption before being sent to the final destination.
The primary SSH port forwarding methods provide two common port forwarding techniques for orchestrating data flow:
- Local Forwarding (-L): Allows clients to access services within the server’s internal network (such as a Web Server) by mapping the client’s port to the server’s port.
- Remote Forwarding (-R): Allows the server or external devices to access services running within the client’s internal network.
Dynamic Port Forwarding: Using the -D parameter, SSH can turn a remote server into a flexible SOCKS proxy. Instead of forwarding a fixed port, all traffic from client-side applications (such as browsers, curl commands) is routed through this single SSH tunnel to securely access the internet or other network resources.
SSH Key-Based Authentication: To establish a secure connection without a password, SSH uses a public key and a private key pair. The process involves the client sending the public key ID, the server checking for the key’s existence in the authorization list, and then sending an encrypted challenge message. The client decrypts, calculates the hash, and sends it back to the server for confirmation before establishing the official connection.
3.4 SSH Port Forwarding for Debugging Internal Services
One of SSH’s most powerful features is port forwarding, which allows access to internal services without opening public ports.
- Core Security Principle: In Yocto development, SSH Port Forwarding plays a crucial role in data protection. This technique creates an encrypted “tunnel” (SSH2 session) through insecure network environments, effectively preventing eavesdropping.
- Flexible Access Navigation: Depending on needs, we can use Local Forwarding (-L) to access the device’s internal services from the host machine, or Remote Forwarding (-R) to reverse the mapping. This method easily connects network components separated by firewalls.
- Optimized Dynamic Connection: With Dynamic Port Forwarding (-D), the SSH connection becomes a flexible SOCKS Proxy. It allows the simultaneous routing of data streams from multiple applications through a single port, optimizing resources and enhancing traffic management.
- Practical application for ADB and Yocto: Applying this mechanism to encapsulate ADB traffic allows for tight access control via SSH key. This ensures that all debugging and data transmission operations on the Yocto device are always performed within a reliable and fully encrypted channel.
4. ADB vs SSH in Yocto: Differences, Use Cases and Best Practices
When deploying Embedded Linux with Yocto Project, the choice between ADB (Android Debug Bridge) and SSH (Secure Shell) directly impacts security, performance, and production capabilities. Each tool serves a different purpose in the firmware development lifecycle.
Core Differences Between ADB and SSH
| Feature | ADB | SSH |
|---|---|---|
| Remote Shell | ✔ | ✔ |
| File Transfer | adb push / pull | scp / sftp |
| Port Forwarding | adb forward | ssh -L / -R / -D |
| Encryption | Limit | Strong encryption (AES, ChaCha20) |
| Authentication | Basic | Public key, certificate |
| Production Deployment | Not recommended | Standard production |
In summary: ADB is optimal for development speed, while SSH is optimal for security and long-term operation.
When to Use ADB in Yocto Development
ADB should be used primarily in the development, prototyping, and firmware debugging phases. This tool allows for quick shell access, direct file push/pull, and port forwarding without complex SSH configuration. In an R&D environment, ADB is particularly useful for log inspection, testing internal services, or quick device manipulation via USB or TCP/IP.
However, ADB (especially ADB over TCP/IP) does not provide the same robust authentication and access control mechanisms as SSH. Therefore, enabling ADB in production firmware can create serious security vulnerabilities. The best practice is to enable ADB only in the development image and remove it entirely from the production build.
When to Use SSH in Embedded Linux Production
In embedded Linux and Yocto production systems, SSH is the standard and mandatory access method to ensure system security. SSH provides end-to-end encrypted connections, supports public key authentication, user authorization, and the ability to restrict access by IP address or security policy. This helps IoT or industrial devices maintain security even when deployed in the field.
Beyond remote shell, SSH also supports secure port forwarding, automation via CI/CD pipelines, and integration with management tools like Ansible. For these reasons, SSH should be the primary remote access method in production firmware, while ADB should only exist in internal development environments.
5. Production Hardening Checklist for Yocto Devices (Security Best Practices)
When deploying Yocto devices to a production environment, enabling ADB and SSH must be accompanied by appropriate security configurations to reduce attack surfaces and ensure system security. Production firmware should be designed to be minimalist, retaining only the components truly necessary for operation and maintenance.
Disable ADB in Production: ADB should only be used during the development phase for debugging, log inspection, and firmware testing. In the production image, ADB should be completely removed from the build process, and ADB TCP ports should not be opened. Keeping ADB in the release firmware can create serious security vulnerabilities, especially when the device is networked. Best practice is to separate the development image and production image from the outset to tightly control remote access configurations.
Enforce SSH Key Authentication: In the production system, SSH should be configured with key-based authentication instead of password login. This helps prevent brute-force attacks and reduces the risk of authentication information leakage. In the /etc/ssh/sshd_config file, disable PasswordAuthentication, restrict PermitRootLogin, and ensure PubkeyAuthentication is enabled. Using SSH keys enhances security when the device is operating in the field or in an industrial environment.
Restrict Network Access: Production firmware should only open ports that are truly necessary, typically port 22 for SSH. Non-essential services should be disabled or not exposed to the network. Additionally, firewall rules should be applied to limit IP access or require connections via an internal VPN. Controlling network exposure is a crucial layer of protection in the security architecture of an Embedded Linux system.
Apply the Least Privilege Principle: Do not use root for all maintenance or operation. Instead, create a separate user for maintenance and grant limited sudo privileges according to usage needs. Applying the least privilege principle minimizes damage if an account is compromised and increases control over system access.
Development vs Production Configuration
The distinction between development and production images needs to be clearly defined during the firmware build process. Development images can enable ADB, allow SSH password login, and integrate debugging tools to support R&D. Conversely, production images must remove ADB, mandate SSH key usage, restrict root login, and remove all debug services to ensure the highest level of security before actual deployment.
| Component | Development | Production |
|---|---|---|
| ADB | Enabled | Disabled |
| SSH Password | Allowed | Disabled |
| SSH Key | Optional | Mandatory |
| Root Login | Allowed | Restricted |
| Debug Tools | Enabled | Removed |
6. Conclusion: Secure and Efficient Remote Access in Yocto-Based Embedded Systems
Enabling ADB and SSH in Yocto is not simply about configuring remote access; it directly impacts the development process, system security, and long-term operational capability of embedded Linux devices.
During the development phase, ADB speeds up debugging, facilitates file manipulation, and makes testing internal services easier. However, when transitioning to a production environment, SSH becomes a mandatory standard due to its strong encryption mechanism, public key authentication, and tight access control.
To build a secure and professional Yocto system, you should:
- Separate development image and production image.
- Disable ADB in the release firmware.
- Implement SSH key-based authentication.
- Limit firewall and user access.
- Perform system hardening before actual deployment.
Key Takeaways
- ADB is suitable for development and R&D.
- SSH is a secure remote access solution for production.
- Port forwarding helps debug internal services without opening public ports.
- Hardening is a mandatory step before releasing firmware.
Final Thoughts
In IoT projects, Industrial Linux, or commercial devices using Yocto, designing the right remote access architecture from the start will help:
- Reduce security risks
- Easier maintainer and firmware updater
- Optimize product development workflows
By combining ADB for development and SSH for production, you can build a flexible and secure embedded Linux system.
Kalman Filter: Breakthrough in IoT & Navigation
1. What is Kalman Filter? The “Silent Hero” of Modern Navigation
2. Why use Kalman filter? Solving sensor noise and uncertainty IoT
3. Core Principles: How the Kalman Filter Algorithm Works Step-by-Step
4. Pros and Cons: Key Advantages and Limitations of Kalman Filters
5. Sensor Fusion in Action: Kalman Filter Applications in IoT and Autonomous Vehicles
6. Real-world Case Study
7. Conclusion: The Future of Precision Positioning in the IoT Era
1. What is a Kalman Filter? The “Silent Hero” of Modern Navigation
Have you ever wondered how your phone can still accurately pinpoint your location on a map even when the GPS signal is intermittent while navigating through tall buildings or tunnels? Or how a self-driving car can smoothly navigate its lane despite constant sensor vibrations? The answer behind these intelligent technologies is the Kalman Filter.
So what is a Kalman Filter? Simply put, it’s not a physical filter (like a water or air filter), but rather an intelligent mathematical algorithm that uses a series of measured values, affected by noise or error, to estimate a variable, thereby increasing accuracy compared to using only a single measured value. Its purpose is to help us “guess” (estimate) the true state of a system, even when the information we receive from sensors is noisy or not entirely accurate. It filters out “noise” from chaotic data to find the “true” signal.

Twelve breathtaking minutes as humans landed on the Moon.
Do you know what this historic moment on the Moon, the smartphone in your pocket, and a modern self-driving car have in common?
They are all secretly using the same tool to answer the seemingly simple yet incredibly complex question: “Where am I in this chaotic world?”. It’s one of the greatest mathematical achievements of the 20th century, a “silent hero” behind most modern navigation technologies, yet its name is little known. Let’s uncover the secrets of the algorithm that took humans to the Moon and is guiding our future.
2. Why is a Kalman filter needed? To solve the problem of noise and uncertainty.
In an ideal technological world, if you want to know the position of an object, you simply use a measuring tape. But in the real world, nothing is perfect. The Kalman filter was created to address the core problem: uncertainty.
The practical problem: Sensor error and environmental noise
In an ideal technological world, determining the position of an object is simply a matter of using a measuring tape. However, the real world is never perfect, and we are always faced with the core problem of uncertainty. Any measuring device has an error margin; for example, the GPS on a phone can be off by tens of meters depending on the weather, or a temperature sensor can still fluctuate slightly even in a stable environment. No measurement is absolutely precise. Furthermore, constantly changing environmental factors such as wind, road friction, or engine lag make predicting movement difficult. If you rely solely on a single, often distorted, source of information, the result will be a erratic and inaccurate journey.
- Sensors are always subject to interference: Your phone’s GPS can be off by a few meters to tens of meters depending on weather conditions. Temperature sensors can fluctuate slightly even if the room temperature remains constant. No measurement is 100% accurate.
- The environment is constantly changing: Wind, road friction, engine lag… are external factors that make predicting the movement of an object difficult.
If you rely solely on a single source of information (for example, only on a GPS that’s experiencing interference), the result will be a choppy and inaccurate route.
The need for Sensor Fusion to optimize accuracy
To address this problem and optimize accuracy, engineers use the “Sensor Fusion” process to combine multiple different sources of information, and the Kalman filter is the most powerful tool for this. Imagine driving in thick fog: you have a predictive model based on maps and speedometers to calculate your position, but this calculation can be wrong due to slippery roads or wind. At the same time, you also have “measuring sensors”—faint road markers—but the fog makes you unsure if you’re seeing them correctly. At this point, the Kalman Filter acts as an intermediary “brain,” constantly evaluating whether to trust its own calculations more or those vague benchmarks. Based on the reliability of each source, it combines them to provide the best possible location estimate.
3. Core Principles: How the Kalman Filter Algorithm Works Step-by-Step
The Kalman filter never fully trusts anything. It operates based on a combination of mathematical modeling (prediction) and sensing (measurement).
Step 1: Prediction
The algorithm uses the physical state equation to shift the system from time t-1 to t.
- Logic: Based on the old velocity and position, it calculates the theoretical new position.
- Uncertainty: Due to external factors (system noise – Process Noise), the confidence probability range widens. This means that after each prediction step, we become less certain about the object’s exact position.
Step 2: Measurement
This is where raw data is collected from the sensors (GPS, Radar, Lidar).
- Logic: The sensor provides a direct observational value of the current state.
- Sensor error: Every measurement involves noise (Measurement Noise). This data is also represented by a probability distribution with its own standard deviation depending on the accuracy of the device.
Step 3: Correction/Update
This is the core logic of the algorithm, where the Kalman Gain (K) acts as the arbiter.
- Calculating K: This coefficient is determined by the ratio between the uncertainty of the forecasting model and the uncertainty of the sensor. If the sensor has low noise, K approaches 1 (the algorithm favors the measurement result). If the sensor has high noise, K approaches 0 (the algorithm favors the forecast result from the model).
- Optimal Estimate: This combination produces a new probability distribution with sharper peaks (smaller standard deviation) than both original data sources. This means the final result is more accurate than any single data source.
4. Pros and Cons: Key Advantages and Limitations of Kalman Filters
The Kalman algorithm has become an indispensable tool in modern engineering due to two outstanding advantages: mathematical optimization and computational efficiency. Theoretically, it has been shown to provide the best possible state estimation (with the smallest mean square error), provided the input assumptions are met. Besides accuracy, Kalman’s practical strength lies in its recursive nature. The algorithm doesn’t need to store entire cumbersome historical data sets; it only needs the immediately preceding state to compute the current step. This “lightweight” characteristic makes it extremely ideal for embedded computing systems with limited memory and power resources, such as in autonomous robots or handheld mobile devices.
However, the standard (basic) Kalman filter is not a panacea due to significant limitations when applied to chaotic real-world scenarios. The biggest limitation is that it only works well with LINEAR systems, that is, systems that can be described by simple linear equations. Meanwhile, most real-world movements—for example, a robot performing a turn—are complex non-linear. Furthermore, this algorithm relies on the rigid assumption that system noise must follow a normal Gaussian distribution (bell-shaped). If the actual environmental noise has a different, unusual shape, the filter’s estimation efficiency will be severely degraded.

Gaussian Noise
To overcome the linearity barrier and bring the algorithm closer to practical applications, scientists have developed powerful upgraded versions. The most well-known and widely used solution is the Extended Kalman Filter, abbreviated as EKF. EKF handles nonlinear systems by applying local “linearization” techniques at each computation step, straightening complex curves to apply the basic Kalman mechanism. Thanks to this adaptability, EKF has now become an industry standard and is widely used in most advanced GPS and navigation systems today.

The Kalman Filter is considered one of the greatest algorithms of the 20th century, playing a core role in the Apollo moon landings and present in most modern rovers. To understand why this algorithm is so important yet so challenging, we need to consider two aspects: its superior computational capabilities and its stringent physical limitations.
Below is a detailed summary of the advantages, limitations, and potential extensions of the Kalman Filter:

Kalman Filter: Overview of Strengths and Limitations
5. Sensor Fusion in Action: Kalman Filter Applications in IoT and Autonomous Vehicles
The Kalman filter is one of the most widely used algorithms in engineering history.
- Applications in GPS Positioning and Navigation: On phones/cars, positioning systems often combine GPS (providing accurate location but slow updates and prone to signal loss) with inertial sensors (inertial meters, gyroscopes – very fast responses but prone to drift errors over time). The Kalman filter combines the advantages of both, providing smooth positioning even when the vehicle enters short tunnels.
- Key Role in Robotics and Autonomous Vehicles: Localization: Helps robots know their exact location on a surface map (SLAM); Object Tracking: Autonomous vehicles use Kalman to predict the movement of pedestrians or other vehicles on the road, thereby calculating a path to avoid collisions.
- Applications in Aerospace, Signal Processing, and Finance: Aerospace: Extremely important for determining the position and attitude of aircraft, rockets, and spacecraft. The Apollo program used Kalman filters in its navigation computers. Signal processing & finance: Used to smooth volatile stock market data or remove noise in audio and video signals.
Returning to the spacecraft mentioned above, this is where Kalman filters made their name.
When the Eagle module was only a few dozen meters from the lunar surface, the navigation computer (AGC) became overloaded and repeatedly reported the “1202 Alarm” error. In this context, the Kalman algorithm acted as the “silent brain” to process conflicting data streams to predict (a physical model). The computer calculated the trajectory of the fall based on the Moon’s gravitational pull and the thrust of the jet engines; then, using sensors and the landing radar, it continuously fired signals down to the surface to measure the actual distance and velocity. Finally, the radar data was heavily disrupted by the uneven terrain and blown-away lunar dust. If the radar had relied entirely on the data, the spacecraft would have made erratic engine adjustments and run out of fuel.
This is how Kalman handled it in those 12 seconds:
- Real-time noise filtering: The Kalman algorithm constantly compares: “The radar says it’s 5 meters away, but the physical model says it should be 7 meters.”
- Calculating Kalman Gain (K): Because the radar was experiencing interference at that time, the coefficient K automatically decreased. The algorithm “suspected” the radar and placed more trust in the stable orbital model.
- Optimal blending: Instead of jerky jumps between numbers, Kalman provided a smooth estimate, allowing Neil Armstrong to maintain stable control and steer the spacecraft away from large rocks, landing safely with only 25 seconds of fuel remaining.
6. Real-World Case Study
The story of Apollo 11 is not just a historical milestone; it also laid the foundation for modern navigation technology. Today, Kalman filters are no longer found in NASA’s massive computers, but are present in every smartphone, drone, and autonomous vehicle through the GNSS/INS integration model.
To understand why we need this combination, let’s look at the illustration of a GNSS satellite system (like GPS, GLONASS…). GNSS provides us with absolute positioning but has the disadvantage of slow update speeds and is prone to signal loss (when entering tunnels or tree canopies). In this case, the Kalman filter acts as the “conductor,” combining satellite data with an inertial navigation system (INS) – consisting of accelerometer and gyroscope sensors – to create a continuous and centimeter-accurate stream of positioning data.
To understand the power of the Kalman Filter, let’s consider a real-world car navigation system where it has to solve the problem of combining data from two sensor sources with contrasting characteristics:
- Accelerometer (INS): Extremely fast response to any change in motion, but if used for long-distance calculations, errors accumulate very quickly (drift).
- GPS (GNSS): Provides accurate absolute position and velocity over the long term, but the update rate is very slow (usually only once per second) and often experiences delays.
We will solve this problem in two stages (two consecutive Kalman loops):
Case 1: Velocity Estimation (A combination of “Quickness” and “Accuracy”)
In the integrated navigation system, the goal of Case 1 is to optimize instantaneous velocity estimation by combining two data sources with different noise characteristics, aiming for both responsiveness and system accuracy.
- Prediction Phase – Leveraging Dynamics: Using data from inertial sensors as input. Through the integration of the kinematic diagram, the filter provides a prediction of the velocity at time $k$. The advantage of this method is its high sampling frequency and instantaneous response to dynamic changes (acceleration, braking). However, due to the influence of white noise and bias, the integration process will cause cumulative error (drift), leading to a gradual deviation of the prediction value over time.
- Update Phase – Absolute Reference Correction: At this stage, the velocity from the GNSS receiver is used as a reference measurement (Observation). Although GNSS data is highly reliable in terms of absolute values, it is limited by the low update frequency and large signal delays due to satellite processing.
The Kalman Filter’s optimization mechanism:

The algorithm balances the two data sources using the Kalman Gain. When the vehicle undergoes a sudden change in state, the difference between the accelerometer prediction and the actual GNSS measurement increases. The Kalman filter analyzes the covariance of the error to determine the weighting:
- In dynamic state: When there is a large change in acceleration that the GNSS hasn’t updated in time due to delay, the filter will temporarily increase the weighting of the accelerometer prediction model, helping the system track the actual velocity trajectory without data lag.
- In steady state: The filter will prioritize data from GNSS to “anchor” the velocity value, while estimating and eliminating the accelerometer polarization error, preventing data drift.
Result: The output of Case 1 is an optimal velocity estimate with smooth, high bandwidth characteristics and no cumulative error. This is the pure input variable for the position estimation model in the next stage.
Case 2: Estimating Location/Distance (Utilizing the results of Case 1)
After obtaining the optimal velocity vector from Case 1, the goal of this phase is to determine the precise spatial coordinates of the vehicle using a second-stage Kalman filter (Cascaded Kalman Filter).
- Prediction Phase – Smooth Trajectory Interpolation: The system uses the “clean” velocity estimate from the output of Case 1 as the input variable to perform position integration. Based on a dynamic state-space model, the filter predicts the vehicle’s next position at a high frequency (e.g., 100Hz). Thanks to the use of the denoised and biased velocity from the previous step, the predicted trajectory becomes extremely smooth, eliminating the “jump” in position often seen in devices using only GNSS.
- Update Phase – Coordinate Anchoring Correction: Absolute coordinates (Longitude, Latitude) from the GNSS system are used as a correction measurement. This is a source of “ground truth” data to control errors. Although the velocity integration in the prediction step was very good, mathematically, even the smallest errors will accumulate over time (Integration Drift). The coordinate points from the GNSS act as absolute “anchors” to reset these errors.
The role of the Kalman Filter in trajectory optimization: The algorithm performs Sensor Fusion at a more complex level:
- Maintaining Continuity: In scenarios of temporary satellite signal loss (e.g., a vehicle entering a tunnel or under dense foliage), the Kalman filter relies entirely on the predicted model from the velocity in Case 1 to “continue” the trajectory. This capability helps the system maintain continuous navigation without interruption.
- Smoothing Data: When a new GNSS signal (which often has noise of a few meters) is detected, the filter does not immediately jump to the new location. Instead, it calculates the difference between the predicted and measured location (Innovation), then subtly adjusts the estimated location through the weights of the covariance matrix.
Overall result: Through a two-stage Kalman filter structure, the system achieves optimal convergence: providing position and velocity with high temporal resolution (from INS) and consistently high absolute accuracy (from GNSS). This is the core principle that enables autonomous vehicles and guided missiles to operate stably in the most dynamically complex environments.
7. Conclusion: The Future of Precision Positioning in the IoT Era
The Kalman filter is proof of the power of mathematics in conquering chaotic realities. From the historic Apollo missions to today’s era of autonomous vehicles, this algorithm remains the “soul” that helps positioning systems achieve absolute accuracy through its ingenious Sensor Fusion mechanism.
In short, mastering the Kalman filter is key to creating groundbreaking IoT products—where smoothness and accuracy are paramount. This is not just technology, but the foundation for a safer and smarter automated future.
TLV320AIC3254 Linux Audio Codec Driver Integration Guide
1. Introduction to TLV320AIC3254 Linux Audio Driver
1.1. Purpose of the TLV320AIC3254 Codec Driver
1.2. Scope and Out-of-scope Features
2. System Overview for TLV320AIC3254 Audio Codec
2.1. Hardware Architecture Overview
2.2. Functional Role of Each System Component
3. Linux Audio Software Architecture (ALSA SoC)
3.1. Audio Data Path in ALSA ASoC Framework
3.2. Audio Control Path (Mixer, Codec, Registers)
4. Linux Kernel Configuration for TLV320AIC3254
4.1. Enable Subsystem in Defconfig
4.2. Disable Audio flag check (AUDIO_ENABLE_FILE)
4.3. Add support Sound Card into Build System
5. Device Tree Integration for TLV320AIC3254
5.1. Declare I2C Controller and Codec Node
5.2. Declare Regulator (Source)
5.3. Declare Sound Card (MDM9607 Audio Common)
6. Verify Driver TLV320AIC3254 Driver Operation
6.1. Checking Sound Card Registration
6.2. Testing Audio Playback Using WAV Files
1. Introduction to TLV320AIC3254 Linux Audio Driver
1.1. Purpose of the TLV320AIC3254 Codec Driver
This document is compiled to provide in-depth technical guidance on the integration (porting) and activation (bring-up) process of the TI TLV320AIC3254 audio codec chip on the Cavli C10QM module platform (based on the Qualcomm MDM9607 chipset) running the Linux operating system.

The C10QM module is designed with a digital audio interface (DAC/ADC) and does not include an integrated analog-to-digital converter (DAC/ADC) for analog-to-digital audio applications; the integration of a codec is required to support voice and music playback features.
Within the scope of this guide, the CC3200 Audio Boost board is used as the reference hardware (Reference Hardware). However, it should be noted that the CC3200 microcontroller on this board will be disabled or ignored (ignored). The C10QM module will act as the host/master device, fully responsible for:
- Data flow management (Data path): Transmitting/receiving PCM audio data via the I2S/MI2S interface.
- Control management (Control path): Configuring the codec’s registers, clock, and power via the I2C interface.
This document will focus on addressing the following core technical issues:
- Software Architecture: Understanding the ALSA audio subsystem (Advanced Linux Audio Architecture) and the ASoC framework (ALSA System on Chip) on the Qualcomm platform.
- Kernel Configuration: Guidance on selecting and compiling necessary drivers (Machine Drivers, Platform Drivers, Codec Drivers).
- Device Tree Integration (DTS): Declaring hardware, configuring GPIO pin muxing, regulators, and clocks for the system to control device recognition actions.
- Testing and Verification: Using non-user-space tools such as tinyalsa and amixer to test audio paths and signal quality.
1.2. Scope & Out-of-scope Features
To ensure the focus and efficiency of the integration process, this document clearly defines the following work items: within the scope of implementation and outside the scope of implementation.
In Scope:
- Hardware Communication Setup: Instructions for connecting and configuring the physical interface between the C10QM and the TLV320AIC3254, including the MI2S data bus (MCLK, BCLK, WCLK, DIN, DOUT) and the I2C control bus.
- Driver Development and Integration: Activating the driver for the TLV320AIC32x4 codec in the Linux Kernel. Declaring a virtual Sound Card in the Device Tree to link the DAI (Digital Audio Interface) CPU and the DAI Codec.
- Audio Routing Configuration: Using kcontrols (Mixer controls) to set the signal path within the Codec (e.g., routing from DAC to Headphone Output).
- Playback Feature: Focuses on enabling PCM audio output to the headphone jack (3.5mm) in Stereo mode.
Out of Scope:
- Advanced Digital Signal Processing (DSP): Does not mention acoustic echo cancellation (AEC), noise suppression (NS), or audio effects (EQ, 3D) on Qualcomm’s Hexagon DSP.
- Application Layer Encoding/Decoding: Does not include integration of decoding libraries for compressed formats (MP3, AAC, Opus, OGG…). The document only covers playback of .wav files (raw PCM).
- Firmware Development for CC3200: Does not include any source code or configuration for the CC3200 MCU chip.
- Analog Circuit Design: Does not delve into the design of filters, capacitors, or analog amplifier circuits. It is assumed that the CC3200 Audio Boost hardware is electrically sound.
- VoLTE/CSFB Call: This document focuses on the System Audio path and does not include detailed configuration for voice stream during mobile network calls.
2. System Overview for TLV320AIC3254 Audio Codec
2.1. Hardware Architecture Overview
The system is built on a Host-Codec model, where the C10QM acts as the master device controlling all operations, and the CC3200 Audio Boost acts as an expansion board providing an analog audio interface. The connection between the two components is based on two industry-standard physical interfaces:
Data Link – Interface MI2S
- Protocol: Uses the MI2S (Multi-channel Inter-IC Sound) standard, also known as Primary I2S.
- Function: Responsible for transmitting raw PCM audio data in real time.
- Data direction: The audio system is designed to operate in full-duplex mode, allowing simultaneous playback and capture on the same I2S/PCM physical interface. Although sharing a common clock bus, the data of these two streams travels on separate signal lines, ensuring no signal interference and supporting independent debugging for each communication direction:
Playback (DL):

For downlink, audio data travels from the host device to the peripheral device. Specifically, digital PCM data is output by the C10QM module’s CPU via the PCM_DOUT pin (Pin 48). This signal enters the DIN pin of the TLV320AIC3254 codec. Here, the codec processes the signal through an interpolation filter and a digital-to-analog converter (DAC) to convert it into analog signal. The final signal is then passed through the amplifier (Headphone Driver) and output to the 3.5mm jack. Software-wise, the DAPM (Dynamic Audio Power Management) path is routed from the “Left/Right DAC” through the corresponding mixers to the HPL/HPR output.
Capture (UL):
In contrast to the transmission stream, the uplink stream is responsible for bringing the signal from the environment into the processing system. The analog signal from the microphone enters the analog input of the codec, is amplified by the PGA (Programmable Gain Amplifier), and converted to digital via the ADC. The digital data is then output from the DOUT pin of the codec and goes to the PCM_DIN pin (Pin 47) of the C10QM module. At the C10QM side, the CPU receives this data stream to perform recording or processing for voice calls.
- Clocking:
The C10QM operates in Master mode, responsible for providing critical clock signals (MCLK, BCLK, WCLK) to synchronize the data sampling of the Codec. The stable operation of both data streams depends entirely on the clocking mechanism, with the C10QM acting as the master device. The C10QM module is responsible for generating and maintaining two crucial signals: the Bit Clock signal (PCM_CLK) on pin 49 to synchronize each data bit, and the Frame Sync signal (PCM_SYNC) on pin 46 to mark the start of the audio frame. Loss of either of these clock signals will cause both transmission and reception to stop immediately.

Control Link – Interface I2C

The I2C interface acts as the “command center” of the audio system, operating in parallel but completely separate from the audio data stream. While the MI2S interface is responsible for transporting digital signals (PCM), the I2C (Inter-Integrated Circuit) protocol is used specifically to configure and monitor the operating status of the Codec chip. Here, the C10QM module acts as the master device, sending read/write commands to the registers of the TLV320AIC3254 to perform complex control tasks:
- Power Management: Controls the power supply or power off for specific functional blocks within the Codec (such as turning off the ADC block when only listening to music, or turning on the Mic-Bias block when recording) to optimize power consumption for the device.
- Clocking System Configuration: Set parameters for the phase-locked loop (PLL) and internal dividers, ensuring the codec can generate standard sampling frequencies (such as 44.1kHz, 48kHz) from the original MCLK clock provided by the C10QM.
- Audio Routing: Control the internal switch matrix of the codec to determine the signal path, for example, connecting the signal from the DAC to headphones or transferring the signal from the microphone to the ADC converter.
- Signal Conditioning: Set digital volume levels, configure the analog gain for the microphone to avoid distortion, and configure audio filters if available.
2.2. Functional Role of Each System Component
Module Cavli C10QM (Follow Qualcomm MDM9607 SoC)
Acting as the Audio Master and considered the central controller of the system, this component operates on an embedded Linux operating system using the ALSA (Advanced Linux Sound Architecture) audio architecture. It integrates machine-level and platform-level drivers to manage all communication with the hardware. Core tasks include initiating the I2C protocol to activate (‘wake up’) and load configuration for the Codec at boot time, managing the audio buffer and controlling DMA to transmit PCM data over the MI2S interface, and providing the essential base clock (MCLK) source to ensure the Codec operates synchronously.
Chip Codec TI TLV320AIC3254
- Features: A low-power stereo audio codec with an integrated miniDSP.
- Main functions: DAC (Digital-to-Analog Converter): Receives digital signals from the C10QM, converts them into analog signals for headphone output; ADC (Analog-to-Digital Converter): Receives electrical signals from the microphone, converts them into digital signals to send back to the C10QM; Mixer & Amp: Mixes signals and amplifies power to drive headphones or external speakers.
CC3200 Audio Boost (Hardware Carrier)
- Function: Provides the electrical infrastructure for the Codec to operate (power supply, filter capacitors, 3.5mm jack, connector header).
- Special Note (Hardware Bypass): By default, this board is designed to work with the CC3200 MCU. In this integration guide, the CC3200 MCU on the board will be completely disabled, and the I2S and I2C signal lines of the Codec (which are connected to the CC3200 MCU) will be disconnected and directly connected (Jump wiring) to the corresponding GPIO pins on the C10QM module. This turns the CC3200 Audio Boost into a pure “Breakout board” for the TLV320AIC3254 chip.
3. Linux Audio Software Architecture (ALSA ASoC)
The audio system on the C10QM (MDM9607) is built on the ALSA System on Chip (ASoC) architecture. This is a standard framework in the Linux kernel designed specifically for embedded systems, separating the SoC control code, codec, and motherboard (machine) for easier reuse.
3.1. Audio Data Path in ALSA ASoC Framework
This is the journey of audio data from a music file to the user’s headphones. The data stream processing (Playback Flow) process occurs as follows:
- User Space (Application Layer): Tools like tinyplay, aplay, or Android Audio HAL read data from the music file (wav, pcm).
- ALSA PCM Interface: Data is pushed down to the kernel via the standard PCM interface (/dev/snd/pcmCxDxp). Here, ALSA manages the Ring Buffer.
- Platform Driver (SoC DMA/DSP): On the Qualcomm MDM9607 chip, data is not directly pushed to the I2S pin by the CPU. Instead, it is transferred to the digital signal processing unit (DSP/LPASS) via DMA.
- CPU DAI (Digital Audio Interface): The I2S interface on the SoC (Primary MI2S) receives data from memory and outputs digital signals (MCLK, BCLK, WS, DATA) externally.
- Codec DAI: The TLV320AIC3254 chip receives I2S signals.
- Codec Processing: Inside the codec, data passes through an interpolation filter, a DAC (Digital-to-Analog) converter, and an amplifier (Amp).
- Analog Output: The analog signal is output to the headphone jack.
3.2. Audio Control Path (Mixer, Codec Registers)
In parallel with the data line, the control line is responsible for hardware configuration. It does not carry audio signals.
- Physical protocol: Uses I2C bus.
- Kernel mechanism: Codec drivers register kcontrols (Kernel Controls) with the ALSA Core.
- Task performed:
- Power Management (DAPM): Automatically turns on/off the power to the blocks (DAC, ADC, Amp) when there is/is not an audio stream to save power.
- Register Config: Configures the PLL registers to create the clock and the Volume register to adjust the volume.
4. Linux Kernel Configuration for TLV320AIC3254
To enable drivers and the audio system on the C10QM, changes to the kernel configuration and build system are required.
4.1. Enable Subsystem in Defconfig
Change file .config mdm9607-perf_defconfig (c10qm_linux_4/build/c10qm/eaj_v3.2/linux_change/v2/) and add infomation into Makefile.
4.2. Disable Audio flag check (AUDIO_ENABLE_FILE)
To prevent the audio module from being skipped during startup, the startup script needs to be modified start_audio_le (path: apps_proc/poky/meta-qti-bsp/recipes-multimedia/audio_dlkm_kernel/files/mdm9607/).
4.3. Add support Sound Card into Build System
Change Makefile To ensure mdm9607 is compiled correctly.
- File:
/apps_proc/vendor/qcom/opensource/audio-kernel/Makefile.am. Add to conditional block (Makefile). - File:
/apps_proc/vendor/qcom/opensource/audio-kernel/Makefile. - Edit file /apps_proc/vendor/qcom/opensource/audio-kernel/asoc.
5. Device Tree Integration for TLV320AIC3254
You need to declare a node for the TLV320AIC3254 codec on the I2C bus and configure the sound card to link the components.
5.1. Declarate I2C Controller and Codec Node
Add node tlv320aic32x4 into node i2c_5 (BLSP1 QUP5).
5.2. Declarate Regulator (Source)
Add virtual power nodes if the hardware does not use PMIC for direct control.
5.3. Declarate Sound Card (MDM9607 Audio Common)
Define the sound card to link the CPU DAI and the Codec DAI.
Note: You need to disable any other default sound cards (e.g., sound-9330, sound-9306) to avoid conflicts.
6. Verifying TLV320AIC3254 Driver Operation
6.1. Checking Sound Card Registration
Using cat /proc/asound/cards To confirm that the system has recognized the new sound card.
6.2. Testing Audio Playback Using WAV Files
Configure the Mixer and play the music file to your headphones.
- Hardware connection: Plug your headphones into the J4 jack on the CC3200 codec board.
- Mixer configuration (Routing): Run the following commands to enable audio routing:
- To play music: Push the sound.wav file to the device.




















