BigDataStack and the Data-Storage Challenge

One of the goals of BigDataStack is to facilitate scalable data storage through a distributed storage layer. This would enable storage across different resources, while supporting data migration for application components and re-allocation of data services across the infrastructure.

BigDataStack is pursuing this objective by studying an ad-hoc data storage architecture with two different components. Partners LeanXcale, through its LXS database and IBM, with its Cloud Object Store (COS), are playing a key role in this phase of the project. The former, is a fully transactional database that will serve operational workloads; the latter, is a cloud database capable of storing numerous terabytes of data.

A Seamless Storage Interface (SSI) lies on top of the data-stores, providing a seamless way to execute queries over a distributed dataset over (using?) different datastores with different interfaces. Connectivity for information retrieval can be made easier with SparkQL, which has already been integrated in the latter.  This means that the end-user can write SparkQL queries and let the SSI decide where the data is located and return the result.

As a result, the following scenarios are supported:

  1. Direct access to the LXS: The User executes an SQL query, requesting data directly from the LXS database using a standard JDBC interface, and the latter returns the resultSet as the response.
  2. Direct access to the COS: The user executes a query, requesting data directly from the COS, using the stocator open source connector that allows to connect Object stores to Spark, and the COS returns the result as the response.
  3. Request data using a simple SparkQL query: According to the query to be executed, SparkQL can pushdown some operations that can be executed by the SSI itself.
  4. Request data using combined query on SparkQL: SparkQL will pushdown all operations that concern the LXS database and retrieve the data from it much faster. It will also pushdown as many operations as are supported by the COS.
  5. Inserting data to the BigDataStack Platform: The data are sent to the Gateway, which is the entry point for the platform. It transforms data from external sources in various formats to the platform’s internal schema.

 

Stay informed on our latest news: subscribe to our newsletter now!

Would you like to learn more on how BigDataStack provides an adequate architecture for big data management? Have a look here.