forked from beetbox/beets
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from beetbox:master #1
Open
pull
wants to merge
1,902
commits into
blackpjotr:master
Choose a base branch
from
beetbox:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
snejus
force-pushed
the
master
branch
3 times, most recently
from
November 22, 2024 01:36
c874689
to
aa0db04
Compare
…ypi push" This reverts commit cf3acec.
This utilises regex substitution in the substitute plugin. The previous approach only used regex to match the pattern, then replaced it with a static string. This change allows more complex substitutions, where the output depends on the input. ### Example use case Say we want to keep only the first artist of a multi-artist credit, as in the following list: ``` Neil Young & Crazy Horse -> Neil Young Michael Hurley, The Holy Modal Rounders, Jeffrey Frederick & The Clamtones -> Michael Hurley James Yorkston and the Athletes -> James Yorkston ```` This would previously have required three separate rules, one for each resulting artist. By using a regex substitution, we can get the desired behaviour in a single rule: ```yaml substitute: ^(.*?)(,| &| and).*: \1 ``` (Capture the text until the first `,` ` &` or ` and`, then use that capture group as the output) ### Notes I've kept the previous behaviour of only applying the first matching rule, but I'm not 100% sure it's the ideal approach. I can imagine both cases where you want to apply several rules in sequence and cases where you want to stop after the first match.
Quick fix for #5467. Checks if the path for python is under the windows store folder then error and point the user to the beets [documentation](https://beets.readthedocs.io/en/stable/guides/main.html). Happy for feedback to improve, but thought it best to exit as early as possible.
Co-authored-by: Šarūnas Nejus <[email protected]>
This is based on the following comment: codecov/codecov-action#1594 (comment)
Fixes #5148. When importing, the code that matches tracks does not consider the medium number. This causes problems on Hybrid SACDs (and other releases) where the artists, track numbers, titles, and lengths are the same on both layers. I added a distance penalty for mismatching medium numbers. Before: ``` $ beet imp . /Volumes/Music/ti/Red Garland/1958 - All Mornin' Long - 1 (6 items) Match (95.4%): The Red Garland Quintet - All Mornin' Long ≠ media, year MusicBrainz, 2xHybrid SACD (CD layer), 2013, US, Analogue Productions, CPRJ 7130 SA, mono https://musicbrainz.org/release/6a584522-58ea-470b-81fb-e60e5cd7b21e * Artist: The Red Garland Quintet * Album: All Mornin' Long * Hybrid SACD (CD layer) 1 ≠ (#2-1) All Mornin' Long (20:21) -> (#1-1) All Mornin' Long (20:21) ≠ (#2-2) They Can't Take That Away From Me (10:24) -> (#1-2) They Can't Take That Away From Me (10:27) ≠ (#2-3) Our Delight (6:23) -> (#1-3) Our Delight (6:23) * Hybrid SACD (CD layer) 2 ≠ (#1-1) All mornin' long (20:21) -> (#2-1) All Mornin' Long (20:21) ≠ (#1-2) They can't take that away from me (10:27) -> (#2-2) They Can't Take That Away From Me (10:25) ≠ (#1-3) Our delight (6:23) -> (#2-3) Our Delight (6:23) ➜ [A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums, Enter search, enter Id, aBort, eDit, edit Candidates? ``` Note that all tracks tagged with disc 1 get moved to disc 2 and vice versa. After: ``` $ beet-test imp . /Volumes/Music/ti/Red Garland/1958 - All Mornin' Long - 1 (6 items) Match (95.4%): The Red Garland Quintet - All Mornin' Long ≠ media, year MusicBrainz, 2xMedia, 2013, US, Analogue Productions, CPRJ 7130 SA, mono https://musicbrainz.org/release/6a584522-58ea-470b-81fb-e60e5cd7b21e * Artist: The Red Garland Quintet * Album: All Mornin' Long * Hybrid SACD (CD layer) 1 ≠ (#1-1) All mornin' long (20:21) -> (#1-1) All Mornin' Long (20:21) ≠ (#1-2) They can't take that away from me (10:27) -> (#1-2) They Can't Take That Away From Me (10:27) ≠ (#1-3) Our delight (6:23) -> (#1-3) Our Delight (6:23) * Hybrid SACD (SACD layer) 2 * (#2-1) All Mornin' Long (20:21) * (#2-2) They Can't Take That Away From Me (10:24) * (#2-3) Our Delight (6:23) ➜ [A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums, Enter search, enter Id, aBort, eDit, edit Candidates? ``` Yay!
The fix is based on the following comment: codecov/codecov-action#1594 (comment)
Except under GitHub CI, where we expect all tests to run.
…y prior to users importing their music library
… getting started guide #4820 ## Description Added a quick checkpoint to ensure the config file is set up correctly prior to users importing their music library. This was something I discovered later after running into an issue with my config file and hope it helps new users avoid the issues I had.
This commit introduces a distance threshold mechanism for the Genius and Google backends. - Create a new `SearchBackend` base class with a method `check_match` that performs checking. - Start using undocumented `dist_thresh` configuration option for good, and mention it in the docs. This controls the maximum allowable distance for matching artist and title names. These changes aim to improve the accuracy of lyrics matching, especially when there are slight variations in artist or title names, see #4791.
Improve requests performance with requests.Session which uses connection pooling for repeated requests to the same host. Additionally, this centralizes request configuration, making sure that we use the same timeout and provide beets user agent for all requests.
Having removed it I fuond that only the Genius lyrics changed: it had en extra new line. Thus I defined a function 'collapse_newlines' which now gets called for the Genius lyrics.
Tidy up 'Google.is_page_candidate' method and remove 'Google.sluggify' method which was a duplicate of 'slug'. Since 'GeniusFetchTest' only tested whether the artist name is cleaned up (the rest of the functionality is patched), remove it and move its test cases to the 'test_slug' test.
* Type the response data that Google Custom Search API return. * Exclude some 'letras.mus.br' pages that do not contain lyric. * Exclude results from Musixmatch as we cannot access their pages. * Improve parsing of the URL title: - Handle long URL titles that get truncated (end with ellipsis) for long searches - Remove domains starting with 'www' - Parse the title AND the artist. Previously this would only parse the title, and fetch lyrics even when the artist did not match. * Remove now redundant credits cleanup and checks for valid lyrics.
Additionally, improve HTML pre-processing: * Ensure a new line between blocks of lyrics text from letras.mus.br. * Parse a missing last block of lyrics text from lacocinelle.net. * Parse a missing last block of lyrics text from paroles.net. * Fix encoding issues with AZLyrics by setting response encoding to None, allowing `requests` to handle it.
If we get caught by Cloudfare, it forwards our request somewhere else and returns some validation text response. To make sure that this text does not get assumed for lyrics, we can disable redirects for the Google backend, check the response code and raise if there's a redirect attempt. This source will then be skipped and the backend continues with the next one.
I think we can make our life easier by removing these checks assuming that users follow the instructions in the docs.
It was my mistake to remove search earlier - I found that in many cases it works fine.
…tionality (#5474) ### Bug Fixes - Fixed #4791: Resolved an issue with the Genius backend where it couldn't match lyrics if there was a slight variation in the artist's name. ### Plugin Enhancements * **Session Management**: Introduced a `TimeoutSession` to enable connection pooling and maintain consistent configuration across requests. * **Error Handling**: Centralized error handling logic in a new `RequestsHandler` class, which includes methods for retrieving either HTML text or JSON data. * **Logging**: Added methods to ensure the backend name is included in log messages. ### Configuration Changes * Added a new `dist_thresh` field to the configuration, allowing users to control the maximum tolerable mismatch between the artist and title of the lyrics search result and their item. Interestingly, this field was previously available (though undocumented) and used in the `Tekstowo` backend. Now, this threshold has also been applied to **Genius** and **Google** search logic. ### Backend Updates * All backends that perform searches now validate each result against the configured `dist_thresh`. #### Genius * Removed the need to scrape HTML tags for lyrics; instead, lyrics are now parsed from the JSON data embedded in the HTML. This change should reduce our vulnerability to Genius' frequent alterations in their HTML structure. * Documented the structure of their search JSON data. #### Google * Typed the response data returned by the Google Custom Search API. * Excluded certain pages under **https://letras.mus.br** that do not contain lyrics. * Excluded all results from MusiXmatch, as we cannot access their pages. * Improved parsing of URL titles (used for matching item/lyrics artist/title): - Handled results from long search queries where URL titles are truncated with an ellipsis. - Enhanced URL title cleanup logic. - Added functionality to determine (or rather, guess) not only the track title but also the artist from the URL title. * Similar to #5406, search results are now compared to the original item and sorted by distance. Results exceeding the configured `dist_thresh` value are discarded. The previous functionality simply selected the first result containing the track's title in its URL, which often led to returning lyrics for the wrong artist, particularly for short track titles. * Since we now fetch lyrics confidently, redundant checks for valid lyrics and credits cleanup have been removed. ### HTML Cleanup * Organized regex patterns into a new `Html` class. * Adjusted patterns to ensure new lines between blocks of lyrics text scraped from `letras.mus.br` and `musica.com`. * Modified patterns to scrape missing lyrics text on `paroles.net` and `lacoccinelle.net`. See the diff in `test/plugins/lyrics_page.py`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )