Databricks unveils several advancements at the Data + AI Summit
Databricks, the Data and AI company pioneering the data lakehouse paradigm, has announced several advancements in major open source data and AI projects including Delta Lake, MLflow and Apache Spark.
At the Data + AI Summit, The largest gathering of the data and AI open source community, Databricks announced that it intends to contribute to the Linux Foundation with all the features and advancements made to Delta Lake and open source all Delta Lake APIs as part of the Delta Lake 2.0 launch.
Additionally, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model implementations. Finally, the company unveiled Spark Connect, to allow the use of Spark on any device, and Project Lightspeed, a next-generation Spark Structured Streaming engine for streaming data on a lakehouse.
“Since 'beginning, Databricks is committed to open standards and the open source community. We have created, participated in, donated and fostered the growth of some of the most impactful open source technologies in existence, ”said Ali Ghodsi, Co-Founder and CEO of Databricks.| ); }
Delta Lake 2.0 will offer all Delta Lake users , query with unprecedented performance and will allow everyone to build a high-performance data lakehouse on open standards. Thanks to this contribution, Databricks customers and the open source community will be able to benefit from all the features and improved performance of Delta Lake 2.0.
Databricks
“Databricks provides Akamai with an open and certified table storage format for particularly complex workloads such as Our. The lakehouse enables interactive analytics across any volume of data, so our customers can analyze security events on our Edge platform near-realtime, ”said Aryeh Sivan, Akamai's VP Engineering.
“We are excited about the rapid evolution that Databricks, along with the rapidly growing community, is bringing to Delta Lake. We look forward to collaborating with the other developers on the project to drive the data community to greater results. "
" Delta Lake is experiencing tremendous growth and activity, a sign that the developer community wants be part of this project. Employee attendance has increased by 60% in the past year, commit growth is 95%, and the average number of lines of code per commit has increased by 900%.
MLflow, one of the Most successful open source machine learning (ML) projects, it has set the standard for ML platforms. The launch of MLflow 2.0 introduces MLflow Pipelines to the platform, substantially decreasing production time and improving execution at scale through standardization.
MLflow Pipelines offers data scientists pre-defined and production-ready models, based on the type of model they are developing, to allow you to bootstrap reliably and accelerate model development without the intervention of production engineers.
As the main and unified engine for large-scale data analysis , Spark can handle datasets of all sizes. However, the lack of remote connectivity and the weight of applications developed and running on the driver node hinder the requirements of modern data applications.
To solve this problem, Databricks introduced Spark Connect, a client interface and server for Apache Spark based on the DataFrame API which will separate the client from the server for better stability and allow for integrated remote connectivity. With Spark Connect, users will be able to access Spark from any device.
In partnership with the Spark community, Databricks also announced Project Lightspeed, the next-generation Spark streaming engine. As the variety of applications that lean towards data streaming has grown, new requirements have emerged for supporting data workloads for lakehouse and data streaming.
Spark Structured Streaming has been largely adopted since the origin of streaming thanks to its ease of use, performance, large ecosystem and developer communities.
With this in mind, Databricks will collaborate with the community and encourage participation in Project Lightspeed to improve performance, ecosystem support for connectors, optimize data processing capabilities with new operators and APIs, and simplify deployment, operations, monitoring and troubleshooting.
At the Data + AI Summit, The largest gathering of the data and AI open source community, Databricks announced that it intends to contribute to the Linux Foundation with all the features and advancements made to Delta Lake and open source all Delta Lake APIs as part of the Delta Lake 2.0 launch.
Additionally, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model implementations. Finally, the company unveiled Spark Connect, to allow the use of Spark on any device, and Project Lightspeed, a next-generation Spark Structured Streaming engine for streaming data on a lakehouse.
“Since 'beginning, Databricks is committed to open standards and the open source community. We have created, participated in, donated and fostered the growth of some of the most impactful open source technologies in existence, ”said Ali Ghodsi, Co-Founder and CEO of Databricks.| ); }
Delta Lake 2.0 will offer all Delta Lake users , query with unprecedented performance and will allow everyone to build a high-performance data lakehouse on open standards. Thanks to this contribution, Databricks customers and the open source community will be able to benefit from all the features and improved performance of Delta Lake 2.0.
Databricks
“Databricks provides Akamai with an open and certified table storage format for particularly complex workloads such as Our. The lakehouse enables interactive analytics across any volume of data, so our customers can analyze security events on our Edge platform near-realtime, ”said Aryeh Sivan, Akamai's VP Engineering.
“We are excited about the rapid evolution that Databricks, along with the rapidly growing community, is bringing to Delta Lake. We look forward to collaborating with the other developers on the project to drive the data community to greater results. "
" Delta Lake is experiencing tremendous growth and activity, a sign that the developer community wants be part of this project. Employee attendance has increased by 60% in the past year, commit growth is 95%, and the average number of lines of code per commit has increased by 900%.
MLflow, one of the Most successful open source machine learning (ML) projects, it has set the standard for ML platforms. The launch of MLflow 2.0 introduces MLflow Pipelines to the platform, substantially decreasing production time and improving execution at scale through standardization.
MLflow Pipelines offers data scientists pre-defined and production-ready models, based on the type of model they are developing, to allow you to bootstrap reliably and accelerate model development without the intervention of production engineers.
As the main and unified engine for large-scale data analysis , Spark can handle datasets of all sizes. However, the lack of remote connectivity and the weight of applications developed and running on the driver node hinder the requirements of modern data applications.
To solve this problem, Databricks introduced Spark Connect, a client interface and server for Apache Spark based on the DataFrame API which will separate the client from the server for better stability and allow for integrated remote connectivity. With Spark Connect, users will be able to access Spark from any device.
In partnership with the Spark community, Databricks also announced Project Lightspeed, the next-generation Spark streaming engine. As the variety of applications that lean towards data streaming has grown, new requirements have emerged for supporting data workloads for lakehouse and data streaming.
Spark Structured Streaming has been largely adopted since the origin of streaming thanks to its ease of use, performance, large ecosystem and developer communities.
With this in mind, Databricks will collaborate with the community and encourage participation in Project Lightspeed to improve performance, ecosystem support for connectors, optimize data processing capabilities with new operators and APIs, and simplify deployment, operations, monitoring and troubleshooting.