Big Data Processing and Mining

Big data


Big data from engineering projects and research is transforming our world. The opportunities are vast. So are the challenges, generated by the sheer volume and complexity of this data. Engineering projects linked to remote operations typically generate unstructured data sets of hundreds of gigabytes – a size beyond the capabilities of commonly used software tools. From equipment and safety monitoring, to movement sensors and cameras, to satellites and mobile communications, remote engineering data trails are massive.

Our aim is to develop new techniques and systems to manage and make sense of big data.

Further information


Problems that we solve

Big data sets contain invaluable knowledge. For example, in marine environments data sets are collected from sensors deployed in floating buoys, underwater remote vehicles and offshore oil and gas platforms. In mining operations, data sets are collected on process control, operating, transportation and maintenance operations. Successful and reliable use of these information channels depends on understanding the patterns within this data.  Every remote operations engineering activity will have projects that demand new techniques and systems for the management of big data.

Our team brings together a multidisciplinary group of expert researchers. Our work will focus on two main areas: data processing and data mining.

Data mining for engineering intelligence 

Big data is only valuable if quality knowledge can be extracted from it. The process of knowledge extraction is known as data mining. We develop new algorithms to extract knowledge from large volumes of data using high performance computers and accelerators. Domain specific techniques are employed to first identify features of interest in the data and then to mine the data to discover the underlying patterns and structure.

We develop ways to automate the discovery of parameters for pattern discovery and changes of context, eliminating the need for expert analysis for each new application, and enabling the extraction of knowledge to be both more accurate and adaptive to changes in the data over time.

Processing engineering for remote operations data

The collection, cleaning and efficient processing of big data is essential for its usability. Our research contributes to the following areas:

  • 'Real-time’ streaming 
    Large amounts of data (also called streaming data) need to be processed at the time of acquisition for cleaning and efficient storage. We ensure all processing is done in real-time, as any lag in processing results in the accumulation of large volumes of unprocessed and unusable data. Efficiency is achieved through the use of high performance computing solutions, either by using special purpose supercomputers or general purpose Graphics Processing Unit accelerators.

  • Data representation
    We design quality data representation frameworks for effective storage, retrieval and servicing of data. Many remote operations require large volumes of data to be analysed in real-time, and the processing speed of data relies on quality storage frameworks.

  • Unstructured data
    Raw data, such as asset logs and geological records, often contains different types of data: numerical, textual, images from infrared and optical cameras, audio from microphone and sensor arrays, and video footage. We develop efficient methods for pre-processing and integrating this data.

  • Integration
    Large scale remote engineering projects produce data from many different sources. The viability and sustainability of these projects depend upon the correct and efficient integration of these data sources.

  • Cleaning of data
    Raw data comes with noise and may become unusable if inconsistencies prevent automatic analysis. We ensure the data is clean and fit for purpose.