Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDEP-0002 Build System Overhaul #47988

Closed
wants to merge 1 commit into from
Closed

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Aug 6, 2022

Cmake POC: #47380
Meson POC: lithomas1#19

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add the most important point here

scipy and numpy have already adopted meson

we need a really really good reason to use cmake over meson

so list any reasons

@WillAyd
Copy link
Member Author

WillAyd commented Aug 6, 2022

I have that already in the Meson section. Do you want that moved somewhere else?

@mroeschke mroeschke added Build Library building on various platforms PDEP pandas enhancement proposal labels Aug 8, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be good to include your opinion on which build system to use in this proposal.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 9, 2022

I am biased by using it with Arrow, but I find CMake relatively simple to use. Compared to Meson there is way more literature available. You can see this in the SO tag comparison but also on sites like O'Reilly where there are books like Modern CMake for C++, CMake Best Practices and CMake Cookbook. Might be a semantics issue but I can't find any books on Meson

@eli-schwartz
Copy link
Contributor

(Many would argue that the large number of third-party books for cmake exist because the official cmake docs are lacking. Particularly with the focus on describing "modern" cmake, whereas Meson tends to be fairly aggressive about simply raising deprecation warnings if you use things that are no longer recommended. Admittedly as a core committer for Meson, I'm somewhat biased... take my buildsystem comparisons with several large grains of salt.)

There's at least one book, that the lead Meson developer wrote for sale: https://meson-manual.com/

It ends up being free these days: https://nibblestew.blogspot.com/2021/12/this-year-receive-gift-of-free-meson.html

(I never did end up reading it, though. It was of course never a priority for me because, having hacked extensively on Meson, I acquired the same information myself.)

@eli-schwartz
Copy link
Contributor

One of the big differences that people usually point out between meson and cmake is that cmake allows third-party modules, and user defined functions. Meson doesn't (avoiding Turing completeness and recommending that support for something be added directly to Meson itself). @rgommers commented on this in the blog post about moving SciPy to Meson:

This seems painful, but it guarantees that people don't just copy around changes from project to project and long-term maintainability deteriorates. Instead, the philosophy is to fix things once for all users.

Obviously YMMV, and this is in fact a dealbreaker for some people.

In return, Meson provides real object types (and type safety) for primitives (strings, integers, booleans, dictionaries and arrays) and build targets, and a module system that supports various broadly useful things -- including the python module that directly handles much of what a build system for python projects would want to do anyway.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 9, 2022

FWIW I also think either will represent a good upgrade over setuptools. I’m not sure we need to decide on one versus other as part of this PDEP as much as agree to move away from setuptools. At some point I think we will have implementations of both to compare

@lithomas1
Copy link
Member

At this point, I think both POCs have reached the point at which they can reliably compile pandas correctly. We might want to benchmark cmake vs meson and note it in the PDEP.

scipy and numpy have already adopted meson

we need a really really good reason to use cmake over meson

I don't think consistency with numpy/scipy is a good reason here. IIUC, numpy and scipy have far more complex builds(e.g. linking with openblas, and scipy does some linking thing with npymath and co.).

The biggest drawback/advantage of meson is that it's still not very mature when it comes to building Python packages(esp. with C/Cython extensions). This means meson is able to accommodate our needs in a build system more(e.g. native Cython support), and the meson developers have been really receptive to feedback(thanks a bunch @eli-schwartz). Unfortunately, the downside of this is that meson and especially the glue for meson/PEP 517 frontends are still somewhat buggy, and documentation is sparse(I've been mostly reverse engineering scipy's meson files).

@rgommers
Copy link
Contributor

Unfortunately, the downside of this is that meson and especially the glue for meson/PEP 517 frontends are still somewhat buggy,

True as of right now, but given that we just have a first release of SciPy out that defaults to meson-python as the build backend, we're finding most/all pain points. So I expect this to not be much of an issue by the time Pandas 1.6.0 is nearing a release (I'd assume a new build system doesn't go in before 1.5.0 anymore?).

and documentation is sparse(I've been mostly reverse engineering scipy's meson files).

meson-python needs some more docs indeed, and I think there's also a general familiarity issue there - PEP 517 backends behave differently from the "default to setuptools" that everyone used till very recently.

For the Meson parts, or for something like "best practices for PyData packages with Cython/C/C++", it'd be great to hear if you have concrete ideas of what is most important to document, or where to do so. I'd be happy to work on this.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 14, 2022

At this point, I think both POCs have reached the point at which they can reliably compile pandas correctly. We might want to benchmark cmake vs meson and note it in the PDEP.

I'm not sure performance is the most important thing since building isn't a constant thing for our development process, but in any case here are some observed timings from my laptop. This is running Ubuntu 22.04 LTS with an i7-1255U and setting to compile a debug build. Configure times are not shown because they are pretty small for both systems

# setuptools baseline
time python setup.py build_ext -j8 --inplace --with-debugging-symbols
real	0m36.054s
user	3m28.768s
sys	0m6.246s

# CMake Default generator
cmake . -DCMAKE_BUILD_TYPE=Debug
time cmake --build .  --parallel
real	1m49.447s
user	13m51.311s
sys	0m18.391s

# CMake Ninja generator
cmake . -DCMAKE_BUILD_TYPE=Debug -G Ninja
time cmake --build .  --parallel
real	1m46.156s
user	14m50.138s
sys	0m19.017s

# Meson
meson setup builddir
cd builddir
# You might not want to do this. See below for alternative.
meson configure '-Dpython.install_env=auto'
time meson compile

real	1m43.980s
user	13m28.714s
sys	0m18.287s

Surprised setuptools did so well, but this ultimately could vary a lot depending on platform and degree of parallelization. CMake / Meson will likely be pretty similar

@eli-schwartz
Copy link
Contributor

Those are some interesting numbers...

What's the value of nproc on that system? Was setup.py cleaned before doing the build?

@WillAyd
Copy link
Member Author

WillAyd commented Aug 15, 2022

nproc is 12. I must have had something else going on with my laptop yesterday. When I run today I get these numbers:

# setuptools baseline
time python setup.py build_ext -j12 --inplace --with-debugging-symbols
real	1m31.846s
user	13m7.494s
sys	0m16.192s

# CMake Default generator
cmake . -DCMAKE_BUILD_TYPE=Debug
time cmake --build .  --parallel

real	0m23.556s
user	3m17.058s
sys	0m9.060s

# CMake Ninja generator
cmake . -DCMAKE_BUILD_TYPE=Debug -G Ninja
time cmake --build .  --parallel

real	0m22.514s
user	3m30.462s
sys	0m8.625s


# Meson
meson setup builddir
cd builddir
# You might not want to do this. See below for alternative.
meson configure '-Dpython.install_env=auto'
time meson compile

real	1m34.171s
user	12m19.070s
sys	0m17.601s

If someone else wants to try from another machine would be helpful. I made sure to rm -rf * && git reset --hard HEAD in between runs for the in-source builds

@eli-schwartz
Copy link
Contributor

Now cmake is the one that shot down to 3.5 minutes of user time... weird.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 15, 2022

Not very scientific. Both are going to run circles around setuptools when it comes to sdist installs, since those are not parallelized

@datapythonista
Copy link
Member

Thanks a lot for putting this together @WillAyd.

I personally find Meson config way simpler and more readable, and I see more advantages in using the same as numpy... Than Arrow.

How do we move forward with this? Correct me if I'm wrong, bit this doesn't seem like a PDEP intended to be merged, as there is no action proposed, but just the discusssion (which is clearly very useful).

What do you think if we start by a poll with 3 options, setuptools, cmake and meson (we can add two different options for cmake if you prefer), and depending on the result we decide how to make the final decision? I'd make the poll open in pandas-dev, but not anonymous, so we can have more opinions, but still have the info about what are the preferences of core devs, people who implemented or maintains the build in other projects...

@WillAyd
Copy link
Member Author

WillAyd commented Aug 20, 2022

I think we can move forward with Meson. Sounds like the preferred tool

@WillAyd
Copy link
Member Author

WillAyd commented Aug 22, 2022

@datapythonista I think we are good on the decision. Does this need to be merged or just closed?

@datapythonista
Copy link
Member

@datapythonista I think we are good on the decision. Does this need to be merged or just closed?

In a way it seems that if we'd like to merge and publish this, the PDEP should be a bit more specific on what is being approved and the technical details of the implementation. This has clearly been very useful to make the decision on whether to move out of the setuptools build and to host the discussion on what are people preferences. But if we end up with a long list of PDEPs, not sure if it's worth having one more PDEP in the list mainly to show the advantages of cmake and meson.

So, I personally don't have a strong preference. If it was my decision I'd probably just close it and link to it when implementing Meson, but happy with whatever you prefer.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 23, 2022

Sounds good. I think we can just close then. Assuming @lithomas1 will continue working on the Meson implementation so can always reopen and add those if deemed worthwhile

@WillAyd WillAyd closed this Aug 23, 2022
@zlspgz
Copy link

zlspgz commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms PDEP pandas enhancement proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants