More readable `to_compact()` #566

ghost · 2017-10-20T13:20:36Z

It is largely a matter of taste but I find "0.1 millimeter" easier to read than "100 micrometer". It is even more blatant in higher dimensions:

100000 nanometer^2 == 0.1 micrometer^2,

pint currently compacts "1e-13 meter" into the former. (The implementation of to_compact() enforces a magnitude > 0.) With this patch one gets the latter. The tests are updated accordingly.

It is largely a matter of taste but I find "0.1 millimeter" easier to read than "100 micrometer". It is even more blatant in higher dimensions: 100000 nanometer^2 == 0.1 micrometer^2, pint currently compacts "1e-13 meter" into the former. (The implementation of `to_compact()` enforces a magnitude `> 0`.) With this patch one gets the latter. The tests are updated accordingly.

hgrecco · 2017-10-20T20:01:21Z

Thanks for the PR. What about 123000 nanometer. I think here 123 micrometer is clearer that 0.123 milimeter. I am happy to open this for discussion

JesterEE · 2017-10-20T21:12:33Z

I agree with @hgrecco. Though, what would be really slick (in my opinion) is an optional argument for decimal precision instead of forcing a compact representation on the user. That way, pint could return whatever prefix of a unit is necessary to have n decimal places.

For example:

import pint
ureg = pint.UnitRegistry()
q = 123000e-9 * ureg('m')
q.to_compact()
q.to_compact(prec=3)

>>> 123 micrometer
>>> 0.123 millimeter

ghost · 2017-10-23T12:51:10Z

Ok, my thoughts on your comments, where it is shown that @JesterEE's proposition is easy to implement but I lack a good name for it.

First of all, I personally prefer 0.123 millimeter to 123 micrometer ;-). So, matter of taste again. In that sense, giving the choice to the user seems to be the way to go.

There is a magnitude span of 1000 between two standard metric prefixes. Currently the to_compact scaling returns a magnitude m such that 1 <= m < 1000.

>>> for i in [1000, 100, 10, 1, 0.1, 0.01, 0.001, 0.0001]: 
    print ('{:.0e} metre -> {:#}'.format (i, ( i*u.m ).to_compact()))
... 
1e+03 metre -> 1.0 kilometer
1e+02 metre -> 100 meter
1e+01 metre -> 10 meter
1e+00 metre -> 1 meter
1e-01 metre -> 100.0 millimeter
1e-02 metre -> 10.0 millimeter
1e-03 metre -> 1.0 millimeter
1e-04 metre -> 100.0 micrometer

(FWIW it's also the choice made by Boost Units)

This proposed PR gives 0.1 <= m < 100.

>>> for i in [1000, 100, 10, 1, 0.1, 0.01, 0.001, 0.0001]: 
    print ('{:.0e} metre -> {:#}'.format (i, ( i*u.m ).to_compact()))
... 
1e+03 metre -> 1.0 kilometer
1e+02 metre -> 0.1 kilometer
1e+01 metre -> 10 meter
1e+00 metre -> 1 meter
1e-01 metre -> 0.1 meter
1e-02 metre -> 10.0 millimeter
1e-03 metre -> 1.0 millimeter
1e-04 metre -> 0.1 millimeter

Other options could be 0.01 <= m < 10 and 0.001 <= m < 1. (Note: if the unit is raised to a power, the bounds are raised to the same power.)

This is all controlled by a single parameter X in quantity.y, to_compact, line 469:

power = int(math.floor((math.log10(abs(magnitude)) / unit_power + X) / 3)) * 3

(and its counterpart for negative unit_power line 471). Current behaviour has X = 0, this PR has X = 1 and the other two aforementioned possibilities would have X = 2 and X = 3 respectively. In other words, the magnitude is in the range [1eX, 1e(3-X)). X here is the prec parameter proposed by @JesterEE and as you can see, it's pretty straightforward to implement. Now I'm not fond of the name precision which could be confused with the actual floating point precision displayed (the actual number of digits after the decimal point):

>>> print ("{:.5f}".format((1 * u.m)))
1.00000 meter

So do we want that and more importantly what should we call it?

I'd really want it, I'm willing to implement it but I have no clue on how to name it.

onealj · 2017-10-23T17:38:17Z

Let's try to dig at the reasons why someone may prefer one unit over another (it's different for everyone) and how we can express that as simple rule to tell pint how you prefer your number for a given quantity. Hopefully we can find a rule that is consistent across different units (kg, meter, second), units raised to a power (meter), and mixed units (kg*m/s, g*km/s) and that doesn't require user code checking the quantity before informing pint of your preferences.

@JesterEE's suggested precision, which I agree that the name seems vague as to what it should do. Users would confuse it for the number of values to display after the decimal, a string formatting operation, not a unit conversion.

The way I would describe how to_compact works is that it performs a unit conversion by choosing the largest unit where the absolute value of the quantity is greater than it equal to 1. For compound units, it follows some order of operations (kg*m/s vs g*km/s) using the kilogram-meter-second system (for example), preferring base units for other units. If quantity is 0, use base units.

In code, I would express this as defining a lower threshold (default=1) used to determine the largest unit. This is essentially the same as @JesterEE'd precision: 10^-prec=thresh, without the undefined fractional precision behavior or bias towards base 10 (binary prefixes, 1000 MiB, 0.977 GiB). (Let's not constrain the API to base 10, since the implementation that uses log10 to quickly pick the largest unit could change in the future.)

(123*u.micrometer).to_compact(threshold=0.001) == 0.123 millimeter.

The upper threshold is dictated by the minimum of the exponents in the units. For example, meter will have the range [1, 1000) and meter² will have a range [1, 1000000). At 1,000,000 m², it will switch to 1 km². And sometimes the next unit isn't a multiple of 1000. For example, millimeter-centimeter-meter-kilometer and microsecond-millisecond-second-minute-hour-day. The current to_compact converts units that are strictly multiples of 1000, but that could change in the future (unless that isn't the point of to_compact)

I think the reason you prefer 0.123 millimeters over 123 micrometers is because you more regularly use millimeters and you can visualize the value better. On the other hand, someone who regularly works with microscopic things may be able to conceptualize 123 micrometers better than 0.123 millimeters. I prefer 200 ns over 0.200 µs because I routinely work with signals that travel at speeds close to the speed of light, and I can use 1 nanosecond ≈ 1 foot to determine the approximate distance a signal traveled given the elapsed time.

If you prefer 123 km over 0.123 Mm, then specifying the lower threshold isn't enough to get the desired units. If user code has to check the quantity to figure out what the lower threshold should be, it might as well specify the unit as (1.23e3*u.m).to_compact(unit='km').

If a new parameter or context were to be added, it would be mutually exclusive with the existing unit argument of to_compact.

Maybe the maximum unit where the number is greater than 0.001 rule isn't really what we want. Perhaps it would be better to select the unit closest to the base unit: if quantity is less than 1*baseunit, use a lower threshold of 0.001, otherwise use a lower threshold of 1. This solves the kilometer and megameter problem. But doesn't solve the gram and kilogram problem (then again, 0.98 kg and 980 g are equally understandable to me, so I don't care which to_compact produces as long as it's deterministic). It also doesn't solve the meter² and meter³ problem. The caller of to_compact would need to be aware of the degree of the unit to specify 1e-3 as the lower threshold for units of degree 1, 1e-6 for degree 2, 1e-9 for degree 3 (or maybe 1e-4 to split the difference between a really large number and really small number), etc.

Maybe a better solution to determine the preferred unit is to write a method that wraps to_compact. This is where you could implement the logic that decides that 0.123 mm is preferred over 123 µm, but 123 km is preferred over 0.123 Mm, and when to switch from m² to km², and how to deal with compound units.

Can you propose a function (that everyone can agree on) that can determine what the preferred units are for a given quantity? I don't think this problem can be reduced down to specifying one or two numbers as an argument to a function.

JesterEE · 2017-10-24T23:49:21Z

It's great that other's are taking some active interest in this PR! The more voices, the better chance we have on getting it right!

I do agree precision might not be the right name of the argument to identify the use. As pointed out, this could (and likely would) be confused with a string formatting which has a completely different meaning in the language with it's own syntax. Even though that's probably what most people would use to_compact for ... it's actually something very different and should reflect that.

I'll throw another one in: scale. This would be mirroring the SQL Server definition and Microsoft Transact-SQL where:
scale: The number of decimal digits that will be stored to the right of the decimal point.

for our purposes

scale: The number of decimal digits that will be ~~stored~~ preferred to the right of the decimal point.

Let's brainstorm some more!

to_compact method argument name suggestions

~~precision~~ [@JesterEE]
threshold (Would require a different input and processing (int => float)) [@onealj]
scale [@JesterEE]

@emmanelletrong
Great work! I think consistency is important on PRs and changing default behavior is only acceptable when necessary. While that's just my 2 cents, I can understand why you'd want X=1 rather than X=0. If to_compact() was not already a "thing", I'd say implement it as you'd like it ... but, it is a "thing". Changing X=1 may break people's established code. Would it be minor ... sure, but for what reason?

@onealj
Some good points about non-base10 units. This can be seen when using to_compact with time quantities. It isn't smart enough to use the registry to find "compatible" units to convert to and just sticks to SI prefixes.

import pint
u = pint.UnitRegistry()
'{:0.2f}'.format(u('1000 min').to_compact())
>>> 1.00 kilominute
'{:0.2f}'.format(u('1000 min').to('sec').to_compact())
>>> 60.00 kilosec

However, I would caution automating this too much because in the current state of pint the compatible_units() method returns a frozenset. This is a great use of this container ... except for instances like this. To make sure the output is tailored successfully, there would need to a 'priority' assigned to the possible compatible conversions so that higher priority units were returned when the "criteria" (whatever that is to be) is met for that unit. This again is very user specific but maybe beyond the scope of what to_compact should be trying to figure out. Here's a hypothetical example where this could cause an undesired output and a hypothetical implementation to address it (which I personally do not like). Any implementation that does something like this would have to be very clever ... even more clever than pint itself can be currently (shown below).

import pint
u = pint.UnitRegistry()
time = u('6000 sec')
time.compatible_units()  # Note some time scaled units (e.g. minute) are not defined here!
>>> frozenset({<Unit('shake')>,
>>>            <Unit('sidereal_second')>,
>>>            <Unit('second')>,
>>>            <Unit('sidereal_minute')>,
>>>            <Unit('sidereal_hour')>,
>>>            <Unit('sidereal_day')>,
>>>            <Unit('week')>,
>>>            <Unit('work_month')>,
>>>            <Unit('sidereal_month')>,
>>>            <Unit('tropical_month')>,
>>>            <Unit('synodic_month')>,
>>>            <Unit('work_year')>,
>>>            <Unit('gregorian_year')>,
>>>            <Unit('julian_year')>,
>>>            <Unit('sidereal_year')>,
>>>            <Unit('leap_year')>,
>>>            <Unit('eon')>})
'{:0.2f}.format(time.to_compact())  # Lets assume to_compact arbitrarily tries all the compatible units till it finds a "good" one, though this would be a bad approach
>>> 1.67 sidereal_hour
'{:0.2f}.format(time.to_compact())  # Maybe a different one this time ... let's see
>>> 100.27 sidereal_minute
'{:0.2f}.format(time.to_compact(unit_priority=['second', 'minute', 'hour', 'day', 'week', 'month', 'year']))  # 'If' to_compact would do this ... provide yet another argument
>>> 100 minute
'{:0.2f}.format(time.to_compact((unit_priority=['year', 'month', 'week', 'day', 'hour', 'minute', 'second']))  # A different priority
>>> 1.67 hour

Also, I don't think specifying the desired unit in to_compact() is the right approach. Using your example above, a user could (and it is better practice) use the to method of a pint.quantity to get the same result:

import pint
u = pint.UnitRegistry()
(1.23e3*u.m).to('km')
>>> 1.23 kilometer

I think that to_compact is solving something completely different. That is, giving the user the "best looking" representation of a quantity, rather than a quantity tailored to a specific unit scaling. Since "best looking" is subjective, the function should be configurable to that trait. Though, the nice thing about 'to_compact' is it's smart for you. The more you have to give it, (desired units, priorities, thresholds, etc.) the more you can just do it on your own with "manual" conversions to units and scales.

I'd rather see an under-engineered solution that doesn't do it all, but is easy to use rather than an over-engineered solution that takes lots of input to get right. But, if you want to implement all that, it may prove to be useful and I will be nothing short of encouraging along the way. Though it's probably way more involved than @emmanelletrong had in mind for this PR.

hgrecco · 2017-10-25T02:57:16Z

@JesterEE is completely right when mentions that the purpose of to_compact is:

giving the user the "best looking" representation of a quantity, rather than a quantity tailored to a specific unit scaling.

to_compact should not be conversion to a specific unit nor a string formatting operation. We have specific functions for those. We should provide a way to configure what best looking means, but not in terms of a specific unit and also avoid confusion with formatting operations. Therefore I agree to move away from precision. (I just realized that I am repeating what @JesterEE said, but he is right)

I do not have yet a strong opinion about the other names.

Regarding non-base10 units, I think it should be done based on the active system and we might want to move that discussion to another thread/PR. If we improve to_compact API for SI prefixed units, then it should work out for the rest in the future. However, it needs to be clear that currently Pint does not have all the information and I think that unit_priority is not the right way to provide it.

As a historical note, when Systems and Groups were implemented I played with the idea of introducing a way to describe "order and preferred units". The idea was that the system (or the group, I was not clear where was best to do it) contains information about which units (among all in available) constitute a useful ladder. So you could ask give me the next unit in this system/group larger than a foot and it will return yard.

While the syntax is not important, I think it is useful to show to clarify the idea. I tried different, one was:

@system imperial using ImperialVolume, USCSLengthInternational, AvoirdupoisUK
    inch < foot < *yard < mile
    ounce < *pound
@end

where the star shows the base unit of system. You could also have something for time second < minute < hour < day < year.

I think something like this could be more extendable and robust, useful for compatible_units and for to_compact. But as I said, I think we can improve to_compact without fixing this and move this discussion to another thread.

onealj · 2017-10-25T04:52:57Z

I like the csv python standard library's use of a dialect object that informs the CSV reader or writer of rules it should use, such as delimiter andend of line character.

It sounds like the unit argument for to_compact() should be retired, since explicitly defining which unit makes the value most readable is best done as quantity.to(unit).

Rather than having an ever growing list of arguments to try to inform to_compact() what units and what number range is preferable for a given quantity, how about a dialect class that the user provides implement a function that makes this determination. I don't think pint needs to provide example dialects like the csv module does, but an example would be an astronomer defining an astronomical dialect that prefers sidereal day, arc second, parsec or light year, etc. A computer scientist may prefer milliseconds, even if it results in a quantity larger than 10,000 ms.

Specifying a dialect class that defines a function to convert a quantity into the best human readable format based on rules implemented in that class would mean to_compact's argument list won't be ever changing as we figure out new information that we need to furnish to to_compact for it to return the best human readable quantity.

To me, the name to_compact doesn't make it obvious that this function exists solely to provide the "best looking representation of a quantity". Compact to me means to minimize the space something takes up, such as minimizing the string length (0.1 s is 1 character shorter than 100 ms--does that mean it's more compact?). If we're talking about best looking representation for human consumption, "pretty print" is the term I've most commonly seen used to describe this. Are we open to a new name to make this function more discoverable?

hgrecco · 2017-10-25T12:48:36Z

Just a comment regarding the unit parameter. Its purpose was to help pint regarding conversion between equivalent simple/compound units. So instead of having kg m / s^2 you could ask to express the result in a (probably prefixed) N. So it is not just unit conversion because it adds and heuristic to determine what is pretty.

I think that the unit parameter in to_compact should not be used to go from kilometer to meter.

hgrecco · 2017-10-25T12:56:04Z

Bu the way, comment in issue #569 about what should be included in 0.9

hgrecco · 2019-12-28T02:04:39Z

While I think this discussion was very useful, I am closing this PR until we can get a good solution.

DAsh0244 mentioned this pull request Nov 14, 2017

Question: to_compact for Measurements #584

Closed

hgrecco closed this Dec 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More readable `to_compact()` #566

More readable `to_compact()` #566

Uh oh!

ghost commented Oct 20, 2017

Uh oh!

hgrecco commented Oct 20, 2017

Uh oh!

JesterEE commented Oct 20, 2017 •

edited

Loading

Uh oh!

ghost commented Oct 23, 2017

Uh oh!

onealj commented Oct 23, 2017 •

edited

Loading

Uh oh!

JesterEE commented Oct 24, 2017

Uh oh!

hgrecco commented Oct 25, 2017

Uh oh!

onealj commented Oct 25, 2017

Uh oh!

hgrecco commented Oct 25, 2017

Uh oh!

hgrecco commented Oct 25, 2017

Uh oh!

hgrecco commented Dec 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

More readable to_compact() #566

More readable to_compact() #566

Uh oh!

Conversation

ghost commented Oct 20, 2017

Uh oh!

hgrecco commented Oct 20, 2017

Uh oh!

JesterEE commented Oct 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Oct 23, 2017

Uh oh!

onealj commented Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JesterEE commented Oct 24, 2017

to_compact method argument name suggestions

Uh oh!

hgrecco commented Oct 25, 2017

Uh oh!

onealj commented Oct 25, 2017

Uh oh!

hgrecco commented Oct 25, 2017

Uh oh!

hgrecco commented Oct 25, 2017

Uh oh!

hgrecco commented Dec 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

More readable `to_compact()` #566

More readable `to_compact()` #566

JesterEE commented Oct 20, 2017 •

edited

Loading

onealj commented Oct 23, 2017 •

edited

Loading