We are pulling together climate related resources.

IDEALM: Efficient Data Reduction Method with Locally Exchangeable Measures IB-2013-133

Stage: Development
Berkeley Lab researcher Alexander Sim and colleagues have developed IDEALM, http://datagrid.lbl.gov/idealem/, a dynamic sampling algorithm that reduces large streaming data, yet provides accurate information about the data for analysis. The Berkeley Lab technology could prove beneficial to network routers, for use in network monitoring mechanisms; facilities that generate large amounts of data, as a means to reduce data volume; and social networks, among other applications.

Large streaming data are an essential part of computational modeling and network communications. Yet such data are generally intractable to store, compute, search, and retrieve. This dynamic data reduction algorithm detects redundant patterns and reduces data size by exploiting the exchangeability of measurements; it exploits both redundancies of data in a time series and redundancies of data distribution. The Berkeley Lab technology can be used for streaming data in high frequency as well as stored data. A common technique in network monitoring and other practices to reduce the size of collected monitoring measurements is to store a random sample, such as one out of 1,000 network packets. The drawbacks to this approach are lack of scalability for high frequency streaming data and no guarantee of reflecting underlying data distribution. Another method is to use the exact or approximate data compression technique, such as spectral analysis. However, current data compression methods require use of either whole data or data chunks of a designated size; these methods are impractical for large streaming data in high frequency. Berkeley Lab’s algorithm resolves drawbacks to the above approaches.

Applications and Industries

Measurement collection mechanisms for network communications and routers System logs Statistical analysis, e.g., financial markets, energy use, social network media Modeling, e.g., environmental studies, nuclear fusion simulations Science and engineering experiments


Efficient data size reduction – 47-80% in tests, with much higher potential Retention of data accuracy Effective on streaming or stored (offline) data