DataPump Service
Page Contents
DataPump Service¶
The DataPump Service continuously pushes machine telemetry data from the Proemion DataPlatform into a customer-managed external cloud storage. Data arrives passively as it is transmitted from the telematics gateways — no active polling or request handling is required.
DataPump delivers time-series metrics - including position data - and Diagnostic Trouble Codes (DTCs).
Time series data is exported as .csv files, DTCs as .json files.
The service is designed for customers who want direct access to their machine data — whether to build custom applications, ensure data ownership and portability or meet regulatory requirements such as the EU Data Act.
It is the recommended alternative to retrieving data via the REST API time series endpoint, particularly when large amounts of data need to be requested (e.g. for big data analytics) or high update rates where polling becomes inefficient.
The following diagram shows the data flow of the DataPump Service.
flowchart LR
Machine -->|CAN Bus| CU[Communication Unit]
CU -->|Cellular| DP[DataPlatform\nInterpretation]
DP --> DPS[DataPump]
DPS -->|CSV / JSON| CS[Customer\nCloud Storage]
Use Cases¶
The following use cases illustrate how customers are using DataPump to integrate machine data into their own systems and workflows.
Custom analytics platform integration DataPump enables customers to build their own big data applications, dashboards, or big data analytics solutions on top of the raw telemetry data delivered to their cloud storage.
EU Data Act compliance DataPump can be used to forward machine data to third parties as required under the EU Data Act, supporting obligations around user data access rights and switching of data processing services. The DataPlatform REST API can also be used for this purpose. See EU Data Act for further information.
Data ownership and portability Customers who want to retain a full copy of their machine telemetry data independent of the Proemion DataPlatform can use DataPump to continuously mirror data into their own storage infrastructure.
Comparison with REST API¶
For comparison, retrieving equivalent data via the REST API time series endpoint requires querying machines, signals, and timestamps individually — per machine and per signal — on every update cycle. At high update rates, this results in a large number of requests, making the REST API approach slow and inefficient for large fleets.
The REST API applies throttling and response size limits to protect service availability. For details, see the REST API throttling.
The following table summarizes the key differences between the two approaches:
| Characteristic | REST API Time Series | DataPump Service |
|---|---|---|
| Data retrieval model | Pull (polling) | Push (event-driven) |
| Request overhead | High — per machine, per signal | None |
| Latency at high update rates | Increases with fleet size | Independent of fleet size |
| Implementation complexity | High — requires polling logic and state tracking | Lower — consume incoming files |
| Suitable for large fleets | Limited | Yes |
Setup¶
The DataPump Service is not a self-service feature. Setup, configuration changes, and deletion are handled by Proemion Customer Success.
Note
The DataPump Service is a paid add-on to the standard DataPlatform service. To order the service, contact your Proemion account manager.
The service supports all machine and device types available on the DataPlatform and is enabled at the level of a Organization Unit. All machines within the selected unit, its subunits, and machines shared with the organization are included in the data stream.
Supported Cloud Storage Targets¶
The DataPump Service supports the following cloud storage platforms:
| Cloud Provider | Storage Type |
|---|---|
| Microsoft | Azure Blob Storage |
| Amazon | AWS S3 |
| Google Cloud Storage |
Up to two buckets or containers can be configured per setup — one for .csv time-series and metric files, and one for .json DTC files.
It is not required to configure both; each dataset can be enabled independently.
Both typically use the same credentials.
For Azure Blob Storage, a connection string is the only supported authentication method.
For Google Cloud Storage, HMAC access key and secret credentials are required. Google Cloud Storage is accessed via its S3-compatibility endpoint — a native GCS service account JSON key is not supported.
Note
Only one DataPump should be configured per organization unit at a time.
Required Information for Setup¶
To request a DataPump setup, provide the following information to Proemion Customer Success:
- Target DataPortal Organization unit to enable the service for.
- Target cloud storage platform.
- Bucket or container name(s) for the dataset(s) to be enabled.
- Credentials for the storage container.
- Desired output options: compression (default:
gzip), format (default:.csvwith GNSS), and file generation interval
File Generation¶
The telematics gateway continuously records machine data in log files (.clf files) and transmits them to the DataPlatform.
The DataPump Service processes these files to generate the export files delivered to the customer's cloud storage.
Time-series data and DTC data follow different file generation rules and are generated independently — there is no direct file-to-file correspondence between CSV and JSON files.
To correlate DTC and time-series data, match the timestamp field in the CSV against the start and end fields in the DTC JSON files.
Time-Series Files¶
Time-series data is exported as zipped .csv files. A new file is created either when the configured time interval expires or when sufficient data has accumulated — whichever comes first.
A single CSV file aggregates data from up to 1000 .clf files.
The file generation interval can be configured individually for each DataPump setup by Proemion Customer Success. The configurable interval range is 30–300 seconds. The default interval is 150 seconds.
Note
Smaller intervals increase update frequency but result in disproportionately higher transmission and storage costs.
The default values represent a compromise between update frequency and file size efficiency.
File sizes depend on the number of machines, signals, and data points in the export. The following are estimates for a setup with 1000 machines and a full batch of 1000 .clf files. Actual sizes will be smaller for smaller batches or fewer machines.
| Uncompressed CSV | gzip compressed | |
|---|---|---|
| Expected (median) | ~70 MB | ~5–9 MB |
| Upper bound (~p98) | ~100 MB | ~10–13 MB |
For the full file format specification including field definitions and examples, see CSV File Structure.
DTC Files¶
DTCs are treated as events with state transitions and lamp states, and are exported as .json files.
One .json file is generated per .clf file received. DTC files are delivered without batching — data is typically available in the customer's cloud storage within 5 seconds of receipt on the DataPlatform.
For the full DTC file format specification including field definitions, transmission behavior, and examples, see DTC File Structure.
Data Retention and Delivery¶
If the customer's cloud storage is temporarily unavailable, the DataPlatform retains data in a persistent queue and continues retrying delivery. Data is retained for up to 24 hours. If a file delivery is interrupted, the file is re-delivered automatically.
CSV File Structure¶
The DataPump Service exports time-series data as .csv files. Depending on the DataPump configuration, files may be delivered uncompressed or compressed with gzip (.csv.gz).
Two format variants are supported for the measurements dataset:
| Format | Description |
|---|---|
csv |
Standard CSV format per RFC 4180, including GNSS columns |
csv-no-gnss |
Standard CSV format per RFC 4180, without GNSS columns |
Each row represents one signal value at a specific point in time for a specific machine. The CSV file has no header row. The following seven columns are present in both formats:
| Field | Type | Description |
|---|---|---|
| CU-ID | String | Unique Identifier of the Communication Unit, usually the IMEI-number |
| machine_id | Long (64-bit) | Internal numeric identifier of the machine |
| machine_name | String | Human-readable name of the machine |
| machine_serialnumber | String | Serial number of the machine |
| timestamp | Integer | Unix timestamp in milliseconds of the data point |
| signal_key | String | Identifier of the signal (e.g. value.common.engine.hours.total) |
| signal_value | String | Measured value of the signal at the given timestamp |
Signal labels are not included in the file — these must be resolved externally.
The following columns are only present when GNSS export or OEM external key export is enabled in the DataPump configuration:
| Field | Type | Description | Condition |
|---|---|---|---|
| latitude | Float | GPS latitude of the machine | csv format only |
| longitude | Float | GPS longitude of the machine | csv format only |
| altitude | Float | GPS altitude of the machine | csv format only |
| heading | Float | GPS heading of the machine | csv format only |
| oem_external_key | String | OEM-specific external machine identifier | When OEM external key export is enabled |
CSV Example Data — csv-no-gnss format (click to view)
"352648068064187",59454,"PRO Donau logging_5k","253004046 SN1560002 352648068064187",1624406349000,"value.common.engine.hours.total","63947720.35"
"352648068064187",59454,"PRO Donau logging_5k","253004046 SN1560002 352648068064187",1624406349000,"value.common.machine.hours.pto.total","13.75"
"352648068064187",59454,"PRO Donau logging_5k","253004046 SN1560002 352648068064187",1624406349000,"value.common.engine.fuel.used.total","172.0"
"352648068064187",59454,"PRO Donau logging_5k","253004046 SN1560002 352648068064187",1624406349000,"value.common.engine.fuel.level","82.0"
"357520077898886",110309,"PRO Donau logging_3k","357520077898886",1624406357000,"value.common.cu.sensors.acc.x","-0.07910156"
"357520077898886",110309,"PRO Donau logging_3k","357520077898886",1624406357000,"value.common.cu.sensors.acc.y","0.0390625"
CSV Example Data — csv format with GNSS (click to view)
"352648068064187",59454,"PRO Donau logging_5k","253004046 SN1560002 352648068064187",1624406349000,"value.common.engine.hours.total","63947720.35",50.5401818,9.6828053,281.112,177.36062
"352648068064187",59454,"PRO Donau logging_5k","253004046 SN1560002 352648068064187",1624406349000,"value.common.machine.hours.pto.total","13.75",50.5401818,9.6828053,281.112,177.36062
"357520077898886",110309,"PRO Donau logging_3k","357520077898886",1624406357000,"value.common.cu.sensors.acc.x","-0.07910156",50.5401818,9.6828053,281.112,177.36062
"357520077898886",110309,"PRO Donau logging_3k","357520077898886",1624406357000,"value.common.machine.geo.speed","0.036",50.5401818,9.6828053,281.112,177.36062
DTC File Structure¶
The DataPump Service exports Diagnostic Trouble Code (DTC) data as .json files.
Depending on the DataPump configuration, files may be delivered uncompressed or compressed with gzip (.json.gz).
Each file contains a set of events about active and completed DTC and lamp states.
Data becomes available once the device is configured for J1939 processing (see J1939 DM1 DM2) and the DataPump service for the DTC and lamp state dataset is enabled on the DataPlatform. Historic DTCs are not available. SPNs, controller names, and FMI descriptions are not included in the file — these must be resolved externally.
Data Model¶
DTCs are treated as events. A DTC can become active and subsequently completed, and lamp states are included in the same file. The file tracks state transitions with start and end times.
Each JSON file contains two top-level objects:
dtcs— contains DTC eventslamps— contains lamp state events
Both objects contain two arrays:
active— events that became active in this transmissioncompleted— events that were completed in this transmission
Note
All dtcs and lamps properties are mandatory. The active and completed arrays can be empty.
DM1/DM2 and DM53 DTCs are both included in the dtcs arrays without distinction. The same (source, SPN, FMI) triple may appear more than once in active[] if it originates from different sources. Consumers must account for this when processing DTC data.
JSON files are named by a randomly generated unique identifier (UUID), for example 3716b7cc-b44a-44c9-a69f-355e2887e865.
The contained object includes a version property identifying the file format version.
DTC Properties¶
Each DTC is represented by an object with the following properties:
| Property | Type | Description |
|---|---|---|
machine |
String | Machine identifier |
source |
Integer | Source address of the controller |
spn |
Integer | Suspect Parameter Number |
fmi |
Integer | Failure Mode Identifier |
start |
Integer | Unix timestamp in milliseconds when the DTC became active |
end |
Integer | Unix timestamp in milliseconds when the DTC was completed (completed DTCs only) |
A DTC event is identified by the combination of source, spn, and fmi. Two DTCs with the same source and spn but different fmi are treated as two independent, simultaneously active entries.
Lamp State Properties¶
Each lamp state is represented by an object with the following properties:
| Property | Type | Description |
|---|---|---|
machine |
String | Machine identifier |
source |
Integer | Source address of the controller |
type |
String | Lamp type: rsl, mil, awl, or protect |
lamp |
Integer | Lamp status: 0, 1, 2, or 3 corresponding to 0b00, 0b01, 0b10, 0b11 |
flash |
Integer | Flash status: 0, 1, 2, or 3 corresponding to 0b00, 0b01, 0b10, 0b11 |
start |
Integer | Unix timestamp in milliseconds when the lamp state became active |
end |
Integer | Unix timestamp in milliseconds when the lamp state was completed (completed states only) |
Lamp types according to the J1939 specification:
| Value | Description |
|---|---|
rsl |
Red stop lamp |
mil |
Malfunction indicator lamp |
awl |
Amber warning lamp |
protect |
Protect lamp |
A lamp state event is identified by the combination of source, type, lamp, and flash. A new event is recorded when this combination appears or disappears from the active set.
Transmission Behavior¶
A DTC or lamp state appears in the active array once — when it is first known to be active.
On subsequent restatements it will not appear again until it is completed.
Once closed, it appears in the completed array with both start and end timestamps.
Scenario 1: DTC Active and Completed in the Same File¶
If a DTC becomes active and completes within the same .clf file, it appears in both the active and completed arrays of the same JSON file.
.clf file
│
├── DTC becomes active
└── DTC completed
│
└──► Single JSON file
├── active: [ DTC ]
└── completed: [ DTC ]
Scenario 2: DTC Active and Completed in Different Files¶
If many .clf files are transmitted between the activation and completion of a DTC, the active and completed events appear in different JSON files.
.clf file 1 .clf files 2…N .clf file N+1
│ │ │
└── DTC active │ └── DTC completed
│ │ │
▼ │ ▼
JSON file 1 │ JSON file N+1
active: [ DTC ] │ completed: [ DTC ]
│
(no DTC events)
Example¶
The following example shows a JSON file containing active and completed DTCs and lamp states:
JSON Example Data (click to view)
{
"version": 3,
"dtcs": {
"active": [
{
"machine": "60334",
"source": 0,
"spn": 151,
"fmi": 981,
"start": 1500016040000
},
{
"machine": "843281",
"source": 12,
"spn": 961,
"fmi": 1423,
"start": 1500016046231
}
],
"completed": [
{
"machine": "60334",
"source": 0,
"spn": 198,
"fmi": 3,
"start": 1500016040000,
"end": 1500016046231
},
{
"machine": "843281",
"source": 12,
"spn": 11,
"fmi": 3,
"start": 1500016046231,
"end": 1500016046231
}
]
},
"lamps": {
"active": [
{
"machine": "60334",
"source": 0,
"type": "mil",
"lamp": 0,
"flash": 1,
"start": 1500016040000
},
{
"machine": "60334",
"source": 0,
"type": "rsl",
"lamp": 1,
"flash": 0,
"start": 1500016040000
},
{
"machine": "60334",
"source": 0,
"type": "awl",
"lamp": 0,
"flash": 0,
"start": 1500016040000
},
{
"machine": "60334",
"source": 0,
"type": "protect",
"lamp": 1,
"flash": 1,
"start": 1500016040000
},
{
"machine": "97976",
"source": 0,
"type": "awl",
"lamp": 1,
"flash": 1,
"start": 1500016050000
}
],
"completed": [
{
"machine": "60334",
"source": 0,
"type": "mil",
"lamp": 1,
"flash": 1,
"start": 1500016040000,
"end": 1500016040000
}
]
}
}
Service and Support¶
The latest versions of the drivers, software, firmware, and documentation are available at Document Library.
Do you need help or want to report a bug?
Visit Proemion for more information, or raise a ticket via Support.
Firmware Updates and Support¶
To ensure the best performance and security of your devices, we strongly recommend always installing the latest firmware provided by Proemion.
Please note:
We do not provide technical support for issues caused by outdated firmware.
Errors resulting from outdated firmware are considered non-qualified errors and are not covered by warranty or support.
Regular firmware updates are essential to maintaining the functionality of your devices.
If you need assistance with the update process, please contact our Service and Support.
For more information on the Firmware Update, check the device manual of your device at the Document Library.