Title
Milvus: A Purpose-Built Vector Data Management System
Abstract
ABSTRACTRecently, there has been a pressing need to manage high-dimensional vector data in data science and AI applications. This trend is fueled by the proliferation of unstructured data and machine learning (ML), where ML models usually transform unstructured data into feature vectors for data analytics, e.g., product recommendation. Existing systems and algorithms for managing vector data have two limitations: (1) They incur serious performance issue when handling large-scale and dynamic vector data; and (2) They provide limited functionalities that cannot meet the requirements of versatile applications. This paper presents Milvus, a purpose-built data management system to efficiently manage large-scale vector data. Milvus supports easy-to-use application interfaces (including SDKs and RESTful APIs); optimizes for the heterogeneous computing platform with modern CPUs and GPUs; enables advanced query processing beyond simple vector similarity search; handles dynamic data for fast updates while ensuring efficient query processing; and distributes data across multiple nodes to achieve scalability and availability. We first describe the design and implementation of Milvus. Then we demonstrate the real-world use cases supported by Milvus. In particular, we build a series of 10 applications (e.g., image/video search, chemical structure analysis, COVID-19 dataset search, personalized recommendation, biological multi-factor authentication, intelligent question answering) on top of Milvus. Finally, we experimentally evaluate Milvus with a wide range of systems including two open source systems (Vearch and Microsoft SPTAG) and three commercial systems. Experiments show that Milvus is up to two orders of magnitude faster than the competitors while providing more functionalities. Now Milvus is deployed by hundreds of organizations worldwide and it is also recognized as an incubation-stage project of the LF AI & Data Foundation. Milvus is open-sourced at https://github.com/milvus-io/milvus.
Year
DOI
Venue
2021
10.1145/3448016.3457550
International Conference on Management of Data
Keywords
DocType
ISSN
Vector database, High-dimensional similarity search, Heterogeneous computing, Data science, Machine learning
Conference
0730-8078
Citations 
PageRank 
References 
0
0.34
0
Authors
22
Name
Order
Citations
PageRank
Jianguo Wang100.34
Xiaomeng Yi200.34
Rentong Guo300.34
Hai Jin46544644.63
Peng Xu500.34
Shengjun Li600.34
Xiangyu Wang77623.91
Xiangzhou Guo800.34
Chengming Li96310.60
Xiaohai Xu1000.34
Kun Yu11387.33
Yuxing Yuan1200.34
Yinghao Zou1300.34
Jiquan Long1400.34
Yu-Dong Cai1534034.45
Zhenxiang Li1600.34
Zhifeng Zhang1700.34
Yihua Mo1800.34
Jun Gu1900.34
Ruiyi Jiang2000.34
Yi Wei2100.34
Charles Xie2200.34