In this article, we will discuss how to merge Pandas DataFrame based on the closest DateTime. To learn how to merge DataFrames first you have to learn that how to create a DataFrame for that you have to refer to the article Creating a Pandas DataFrame. After creating DataFrames need to merge them and to merge the Dataframe there's a function named merge_asof() when it comes to writing this then it can be written as:
pandas.merge_asof(left, right, on=None, left_on=None, right_on=None, left_index=False, right_index=False, by=None, left_by=None, right_by=None, suffixes=('_x', '_y'), tolerance=None, allow_exact_matches=True, direction='backward')
To complete this task we have to import the library named Pandas.
In this step, we have to create DataFrames using the function "pd.DataFrame()". In this, we created 2 data frames one is named left and another is named right because our last goal is to merge 2 data frames based on the closest DateTime. It can be written as:
left = pd.DataFrame( {
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG","MSFT","MSFT","MSFT","GOOG","AAPL","GOOG","MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
right = pd.DataFrame( {
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
In this step, the data frames are going to be merged using the function "pd.merge_asof()". The result of the merge_asof() function is stored in a variable and then the variable is printed by using "print()".
It created the same data frame into 2 frames one is indicated as x and another is created as y i.e. bid_x, bid_y, ask_x, ask_y.