?
Metadata-Driven Industrial-Grade ETL System
Digital transformation of a railway system based on big data technologies relies on integrating large volumes of streaming data into digitally enabled enterprise systems to form a comprehensive and efficient intelligent transportation system. Data requirements of the smart railway transportation involve a large number of unstructured data and semi-structured data including railway KPI data. Traditional ETL technology cannot cope with fast growing demands of processing large volumes of real-time data collected from heterogeneous sources both inside the system and in the environment. According to the characteristics of the railway KPI data, this paper proposes the designs of an automated ETL system with higher versatility and efficiency of data processing. To reach the goals, we optimize the workflow of the ETL using a proprietary designed metadata management framework. Making ETL suitable for big data-driven railway transportation environment, requires redesigning the ETL processing rules by using metadata model and then optimizing the extracting, transforming and loading processes of the ETL system. Our experimental results with actual railway KPI data show that the proposed metadata supported automated ETL system can effectively serve the railway KPI data processing using open source distributed big data technologies. The proposed metadata framework proved to be efficient in processing complex data structures and large data capacity of big data.