Picture of Daniel Lemire

 is a full professor in computer science. His research is focused on indexing techniques and data science. For example, he works on bitmap indexes and integer-compression techniques. He also works on database design and probabilistic algorithms (e.g., universal hashing). He likes to debate and think about the effects of recent technology on our civilization. 

His work on bitmap indexes is used by companies such as eBay, Facebook, LinkedIn and Netflix in their data warehouses within big-data frameworks such as Apache HiveDruid,  Netflix Atlas, LinkedIn PinotApache Spark and Apache KylinGit, the ubiquitous version control system uses his techniques to accelerate queries. Some of his compression software is used by Apache Arrow and Apache Impala. His indexing techniques are used to accelerate medical research. Some of his techniques have been adopted by Apache Lucene, the search engine behind sites such as  Wikipedia and frameworks such as Solr and Elastic. One of his hashing techniques can be found in Google's Tensorflow where it improves performance. His Slope One algorithm is a reference in recommender systems.

He published over 25 articles in science journals and about 50 refereed papers. He has held competitive grants for over fifteen years. He regularly serves as an expert in prestigious program committees (e.g., ACM CIKM, ACM WSDM, ACM SIGIR, ACM RecSys). 

He works regularly on open source software libraries. He programs regularly in C, C++, Java, JavaScript, Python, Swift and Go. He works primarily in an open-source setting (e.g., Linux).

He is a long-time social media user: his blog has thousands of readers and was featured on slashdot, reddit and hacker news. He was one of the first Twitter users: @lemire.

Some recent research papers

Students

Daniel Lemire regularly supervises students, from the undergraduate to the Ph.D. level. He works primarily with students who love to program and who prefer an open source setting (e.g., Linux). Many of his students contribute to open-source projects on sites such as GitHub.

He recently supervised the following Ph.D. students: 

  • Jing Li, computer science (graduated in 2016);
  • Samy Chambi, computer science (graduated in 2016);
  • Hazel Webb, computer science (graduated in 2010).

Education

  • Postdoctorate (Institute of Biomedical Engineering)
  • Engineering Mathematics Ph.D. (University of Montreal and Polytechnique Montréal)
  • Master in Mathematics (University of Toronto)
  • Bachelor degree in Mathematics (University of Toronto), with High Distinction

Research Interests

  • Data Science
  • Data Indexing
  • Data Engineering
  • Software Performance
  • Vectorization (SIMD)

Teaching

Research

Research program

We seek to accelerating sofware indexing techniques, either within search engines or within databases. In this work, we exploit recent and emerging hardware capabilities. In particular, we seek to fully benefit from vector instructions. To keep the memory close to the processor, we seek to improve index compression, whether they are inverted indexes, B-trees or bitmap indexes. We seek to uncompress data at great speed in RAM. We want to accelerate common operations such as intersections and unions.

Current research grants

  • Faster Compressed Indexes On Next ­Generation Hardware (details withheld by request of the funding agency)
  • Data reordering for better compression in databases (NSERC Discovery Grant, 2012-2017) : $140,000
  • Adapting forests to global change through high-tech field monitoring, transplantation experiments and simulation models (John R. Evans Leaders Fund with N. Bélanger [PI] and E. Filotas): $800,000

Publications & Presentations

Journal articles (refereed)

Lemire, Daniel; Kurz, Nathan, & Rupp, Christoph (2018). Stream VByte: Faster byte-oriented integer compression. Information Processing Letters, 130. https://doi.org/10.1016/j.ipl.2017.09.011

Ivanchykhin, Dmytro; Ignatchenko, Sergey, & Lemire, Daniel (2017). Regular and almost universal hashing: an efficient implementation. Software: Practice and Experience, 47 (10). https://doi.org/10.1002/spe.2461

Muła, Wojciech; Kurz, Nathan, & Lemire, Daniel (In Press). Faster population counts using AVX2 instructions. Computer Journal. https://doi.org/10.1093/comjnl/bxx046

Lemire, Daniel, & Rupp, Christoph (2017). Upscaledb: Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions. Information Systems, 66, 13–23. https://doi.org/10.1016/j.is.2017.01.002

Badia, Antonio, & Lemire, Daniel (In Press). On Desirable Semantics of Functional Dependencies over Databases with Incomplete Information. Fundamenta Informaticae.

Lemire, Daniel; Ssi-Yan-Kai, Gregory, & Kaser, Owen (2016). Consistently faster and smaller compressed bitmaps with Roaring. Software: Practice and Experience, 46 (11), 1547-1569. https://doi.org/10.1002/spe.2402

Li, Jing; Yan, Yuhong, & Lemire, Daniel (2016). Full Solution Indexing for top-K Web Service Composition. IEEE Transactions on Services Computing (99). https://doi.org/10.1109/TSC.2016.2578924

Chambi, Samy; Lemire, Daniel, & Godin, Robert (2016). Vers de meilleures performances avec des Roaring bitmaps. Technique et Science Informatiques, 35 (3), 335-355.

Lemire, Daniel, & Boytsov, Leonid (2015). Decoding billions of integers per second through vectorization. Software: Practice & Experience, 45 (1), 1-29. https://doi.org/10.1002/spe.2203

Lemire, Daniel, & Kaser, Owen (2014). Strongly universal string hashing is fast. Computer Journal, 57 (11), 1624-1638. https://doi.org/10.1093/comjnl/bxt070

Webb, Hazel; Lemire, Daniel, & Kaser, Owen (2013). Diamond dicing. Data & Knowledge Engineering, 86. https://doi.org/10.1016/j.datak.2013.01.001

Lemire, Daniel; Kaser, Owen, & Gutarra, Eduardo (2012). Reordering rows for better compression: Beyond the lexicographic order. ACM Transactions on Database Systems, 37 (3). https://doi.org/10.1145/2338626.2338627

Zhu, Xiaodan; Turney, Peter; Lemire, Daniel, & Vellino, Andre (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66 (2), 408-427. https://doi.org/10.1002/asi.23179

Badia, Antonio, & Lemire, Daniel (2015). Functional dependencies with null markers. Computer Journal, 58 (5), 1160-1168. https://doi.org/10.1093/comjnl/bxu039

Kaser, Owen, & Lemire, Daniel (2006). Attribute value reordering for efficient hybrid OLAP. Information Systems, 176 (16), 2304-2336. https://doi.org/10.1016/j.ins.2005.09.005

Lemire, Daniel (2006). Streaming maximum-minimum filter using no more than three comparisons per element. Nordic Journal of Computing, 13 (4), 328-339.

Lemire, Daniel (2005). Scale and translation invariant collaborative filtering systems. Information Retrieval, 8 (1), 129-150. https://doi.org/10.1023/B:INRT.0000048492.50961.a6

Lemire, Daniel; Boley, Harold; McGrath, Sean, & Ball, Marc (2005). Collaborative filtering and inference rules for context-aware learning object recommendation. Interactive Technology and Smart Education, 2 (3). https://doi.org/10.1108/17415650580000043

Dubuc, Serge; Lemire, Daniel, & Merrien, Jean-Louis (2001). Fourier analysis of 2-point Hermite interpolatory subdivision schemes. Journal of Fourier Analysis and Applications, 7 (5), 532-552. https://doi.org/10.1007/BF02511225

Lemire, Daniel, & Kaser, Owen (2008). Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays. ACM Transactions on Algorithms, 4 (1), 1-31. https://doi.org/10.1145/1328911.1328925

Lemire, Daniel; Brooks, Martin, & Yan, Yuhong (2009). An optimal linear time algorithm for quasi-monotonic segmentation. International Journal of Computer Mathematics, 86 (7). https://doi.org/10.1080/00207160701694153

Lemire, Daniel (2009). Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recognition, 42 (9). https://doi.org/10.1016/j.patcog.2008.11.030

Lemire, Daniel, & Kaser, Owen (2010). Recursive n-gram hashing is pairwise independent, at best. Computer Speech & Language, 24 (4), 698-710. https://doi.org/10.1016/j.csl.2009.12.001

Lemire, Daniel; Kaser, Owen, & Aouiche, Kamel (2010). Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering, 69 (1), 3-28. https://doi.org/10.1016/j.datak.2009.08.006

Badia, Antonio, & Lemire, Daniel (2011). A call to arms: Revisiting database design. SIGMOD Record, 40 (3), 61-69. https://doi.org/10.1145/2070736.2070750

Lemire, Daniel, & Kaser, Owen (2011). Reordering Columns for Smaller Indexes. Information Sciences, 181 (12), 2550–2570. https://doi.org/10.1016/j.ins.2011.02.002

Lemire, Daniel (2012). The universality of iterated hashing over variable-length strings. Discrete Applied Mathematic, 160 (4-5), 604–617. https://doi.org/10.1016/j.dam.2011.11.009

Neylon, Cameron; Aerts, Jan; Brown, C. Titus; Coles, Simon J.; Hatton, Les; Lemire, Daniel; Millman, K. Jarrod; Murray-Rust, Peter; Perez, Fernando; Saunders, Neil; Shah, Nigam; Smith, Arfon; Varoquaux, Gaël, & Willighagen, Egon (2012). Changing computational research. The challenges ahead. Source Code for Biology and Medicine, 7 (2). https://doi.org/10.1186/1751-0473-7-2

Prekopcsák, Zoltán, & Lemire, Daniel (2012). Time Series Classification by Class-Specific Mahalanobis Distance Measures. Advances in Data Analysis and Classification, 6 (3). https://doi.org/10.1007/s11634-012-0110-6

Kaser, Owen, & Lemire, Daniel (2016). Compressed bitmap indexes: beyond unions and intersections. Software: Practice and Experience, 46 (2). https://doi.org/10.1002/spe.2289

Crainiceanu, Adina, & Lemire, Daniel (2015). Bloofi : Multidimensional Bloom Filters. Information Systems, 54. https://doi.org/10.1016/j.is.2015.01.002

Zhao, Wayne Xin; Zhang, Xudong; Lemire, Daniel; Shan, Dongdong; Nie, Jian-Yun; Yan, Hongfei, & Wen, Ji-Rong (2015). A General SIMD-based Approach to Accelerating Compression Algorithms. ACM Transactions on Information Systems, 33 (3). https://doi.org/10.1145/2735629

Chambi, Samy; Lemire, Daniel; Kaser, Owen, & Godin, Robert (2016). Better bitmap performance with Roaring bitmaps. Software: Practice and Experience, 45 (5), 709–719. https://doi.org/10.1002/spe.2325

Lemire, Daniel, & Kaser, Owen (2016). Faster 64-bit universal hashing using carry-less multiplications. Journal of Cryptographic Engineering, 6 (3), 171-185. https://doi.org/10.1007/s13389-015-0110-5

Book chapters

Aouiche, Kamel; Lemire, Daniel, & Godin, Robert (2009). Web 2.0 OLAP: From data cubes to tag clouds. In Web Information Systems and Technologies. 4th International Conference, WEBIST 2008, Funchal, Madeira, Portugal, May 4-7, 2008, Revised Selected Papers. Springer, coll. « Lecture Notes in Business Information Processing », vol. 18.

Noël, Sylvie, & Lemire, Daniel (2010). On the Challenges of Collaborative Data Processing. In Foster, Jonathan (Ed.), Collaborative Information Behaviour. User Engagement and Communication Sharing (p. 55-71). IGI Global : IGI Global.

Papers in Conference Proceedings (refereed)

Chambi, Samy; Lemire, Daniel, & Godin, Robert (2016). Nouveaux modèles d’index bitmap compressés à 64 bits. In Actes des 12es journées francophones sur les Entrepôts de Données et l'Analyse en Ligne.

Chambi, Samy; Lemire, Daniel; Godin, Robert; Boukhalfa, Kamel; Allen, Charles, & Yang, Fangjin (2016). Optimizing Druid with Roaring bitmaps. In Proceedings of the 20th International Database Engineering & Applications Symposium. ACM. ISBN 978-1-4503-4118-9 https://doi.org/10.1145/2938503.2938515

Li, Jing; Yan, Yuhong, & Lemire, Daniel (In Press). Scaling up Web Service Composition with the Skyline Operator. In IEEE International Conference on Web Services 2016.

Li, Jing; Yan, Yuhong, & Lemire, Daniel (2015). A web service composition method based on compact K2-trees. In IEEE International Conference on Services Computing (p. 403 - 410). IEEE. ISBN 978-1-4673-7280-0 https://doi.org/10.1109/SCC.2015.62

Chambi, Samy; Lemire, Daniel, & Godin, Robert (2014). Roaring bitmap : nouveau modèle de compression bitmap. In Actes des 10e journées francophones sur les Entrepôts de Données et l'Analyse en Ligne.

Li, Jing; Yan, Yuhong, & Lemire, Daniel (2014). Full Solution Indexing Using Database for QoS-aware Web Service Composition. In IEEE International Conference on Services Computing (p. 99 - 106). IEEE. ISBN 978-1-4799-5065-2 https://doi.org/10.1109/SCC.2014.22

Aouiche, Kamel; Lemire, Daniel, & Godin, Robert (2008). Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation. In Proceedings of WEBIST 2008. Portugal : Institute for Systems and Technologies of Information, Control and Communication.

Aouiche, Kamel, & Lemire, Daniel (2007). A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP. In Proceedings of the 10th International Workshop on Data Warehousing and OLAP. ACM.

Aouiche, Kamel, & Lemire, Daniel (2007). Unasssuming View-Size Estimation Techniques in OLAP. In Proceedings of the 9th International Conference on Enterprise Information Systems. Portugal : INSTICC.

Kaser, Owen, & Lemire, Daniel (2007). Removing Manually-Generated Boilerplate from Electronic Texts: Experiments with Project Gutenberg e-Books. In Spencer, Bruce; Story, Margaret-Ann, & Stewart, Darlene (Ed.), Proceedings of the 2007 Conference of the Center for Advanced Studies on Collaborative Research (CASCON '07). Riverton, NJ, É.-U. : IBM.

Kaser, Owen, & Lemire, Daniel (2007). Tag-Cloud Drawing: Algorithms for Cloud Visualization. In Proceedings of the Tagging and Metadata for Social Information Organization Workshop, 16th International World Wide Web Conference (WWW 2007). Banff, Canada : IW3C2.

Kucerovsky, Dan, & Lemire, Daniel (2007). Monotonicity Analysis over Chains and Curves. In Curve and surface fitting: Avignon 2006 (p. 180-190). Brentwood, TN, É.-U. : Nashboro Press.

Kaser, Owen; Lemire, Daniel, & Keith, Steven (2006). The LitOLAP Project: Data Warehousing with Literature. In Proceedings of the 2006 CaSTA Conference. University of New Brunswick.

Brooks, Martin; Yan, Yuhong, & Lemire, Daniel (2005). Scale-Based Monotonicity Analysis in Qualitative Modelling with Flat Segments. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence. Edinburgh, UK : IJICAI.

Lemire, Daniel (2005). A Better Alternative to Piecewise Linear Time Series Segmentation. In Apte, Chid; Skillicorn, David; Liu, Bing, & Parthasara, Srinivasan (Ed.), Proceedings of the 2007 SIAM International Conference on Data Mining (SDM'07) (p. 545-550). Minneapolis, Minnesota : SIAM. https://doi.org/10.1137/1.9781611972771.59

Lemire, Daniel; Brooks, Martin, & Yan, Yuhong (2005). An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation. In Han, Jiawei; Wah, Benjamin W.; Vijay, Raghavan; Wu, Xindong, & Rastogi, Rajeev (Ed.), Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM-05) (p. 709-712). Piscataway, NJ : IEEE. https://doi.org/10.1109/ICDM.2005.25

Lemire, Daniel, & Maclachlan, Anna (2005). Slope One Predictors for Online Rating-Based Collaborative Filtering. In Kargupta, Hillol; Srivastava, Jaideep; Kamath, Chandrika, & Goodman, Arnold (Ed.), Proceedings of the 2005 SIAM International Conference on Data Mining (SDM'05) (p. 471-475). Newport Beach, CA : SIAM.

Kaser, Owen, & Lemire, Daniel (2003). Attribute Value Reordering for Efficient Hybrid OLAP. In Rizzi, Stefano, & Song, Il-Yeol (Ed.), Proceedings of the ACM Sixth International Workshop on Data Warehousing and OLAP (p. 1-8). New Orleans, LA : ACM.

Lemire, Daniel (2003). A Family of 4-Point Dyadic Multistep Subdivision Schemes. In Cohen, Albert; Merrien, Jean-Louis, & Scumaker, Larry L. (Ed.), Curves and Surface Fitting: Saint-Malo 2002 (p. 259-268). Brentwood, TN, USA : Nashboro Press.

Lemire, Daniel (2002). Wavelet-Based Relative Prefix Sum Methods for Range Sum Queries in Data Cubes. In Stewart, Darlene A., & Johnson, J. Howard (Ed.), Proceedings of the 2002 Conference of the Center for Advanced Studies on Collaborative Research (CASCON '02) (p. 6). Riverton, NJ, USA : IBM.

Webb, Hazel; Kaser, Owen, & Lemire, Daniel (2008). Pruning Attributes From Data Cubes with Diamond Dicing. In IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications. ACM International Conference Proceeding Series.

Kaser, Owen; Lemire, Daniel, & Aouiche, Kamel (2008). Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes. In Proceedings of the 11th ACM International Workshop on Data Warehousing and OLAP. ACM.

Lemire, Daniel, & Vellino, Andre (2011). Extracting, Transforming and Archiving Scientific Data. In Proceedings of the Fourth Workshop on Very Large Digital Libraries. DELOS Association for Digital Libraries.

Ruer, Perrine; Gouin-Vallerand, Charles; Zhang, Le; Lemire, Daniel, & Vallières, Évelyne F. (2015). An analysis tool for the contextual information from field experiments on driving fatigue. In Proceeding of the Ninth International and Interdisciplinary Conference on Modeling and Using Context (Context 2015). Springer, coll. « LNAI ».

Anderson, Michelle; Ball, Marcel; Boley, Harold; Greene, Stephen; Howse, Nancy; Lemire, Daniel, & McGrath, Sean (2003). RACOFI: A Rule-Applying Collaborative Filtering System. In IEEE/WIC COLA 2003.

Plaisance, Jeff; Kurz, Nathan, & Lemire, Daniel (2015). Vectorized VByte Decoding. In First International Symposium on Web Algorithms.

Conference presentations (refereed)

Aouiche, Kamel; Lemire, Daniel, & Kaser, Owen (Jun 2008). Tri de la table de faits et compression des index bitmaps avec alignement sur les mots. Paper presented at the 24ièmes journées 'Bases de Données Avancées'.

Awards & Honors

Industry prizes

  • Google Open Source Peer Bonus Program (2012)

Paper awards

  • Best student paper award (IEEE SCC 2014)
  • Best paper award (CASCON 2002)

Community Service

PUBLIC appearances

Program committee (international conferences)

  • ACM Conference on Information and Knowledge Management (ACM CIKM)
  • ACM Conference on Web Search and Data Mining (ACM WSDM)
  • ACM Conference on Information Retrieval (ACM SIGIR)
  • ACM Conference on Recommender Systems (ACM RecSys)
  • ACM/IEEE Joint Conference on Digital Libraries (JCDL)

Funding bodies

  • FQRNT: review committee 03F (theoretical computer science) since 2007.
  • FQRNT: review committee 309 (team projects in computer science) since 2006.
  • NSERC: Research Tools and Instruments Grants Program (2012-2015)
  • NSERC: Computer Science Evaluation Group (EG 1507) for the Discovery Grants Program (2018-2020)

external referee (Ph.D.)

  • Mohammed Shaaban at Université Pierre et Marie Curie (2017) - supervised by Patrick Garda.
  • Mehdi Boukhechba at UQAC (2016) - supervised by Abdenour Bouzouane and Charles Gouin-Vallerand.
  • Hicham Assoudi at UQAM (2016) - supervised by Hakim Lounis.
  • Khaled Dehdouh at Lyon 2 (2015) - supervised by Omar Boussaid.
  • Martin Leginus at Aalborg University (2015) - supervised by Peter Dolog.
  • Ahmad Taleb at Université Concordia (2011) - supervised by Todd Eavis.

EXTERNAL REFEREE (Promotion)

  • Sabine Loudcher Rabaseda at Université Lyon2 - habilitation.
  • Jason Sawin at Université of St. Thomas.
  • Amer Nizar AbuAli at Philadelphia University.
  • Jinan Fiaidhi at Lakehead University.