So I have a code as the following:
Yb=pd.DataFrame(y, column='something') df_merge = pd.merge(Yb, file, on='something', how='left')
I don’t quite understand what does the code do? what do
on= job here?
columnsIndex or array-like Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.
y is the data being accessed, and the
column argument is, well, the columns. Here is a simple example.
# Import pandas library import pandas as pd # initialize list of lists data = [['tom', 10], ['nick', 15], ['juli', 14]] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Age']) # print dataframe. df
That will output something like this:
df_merge, we are essentially combining data. It requires two arguments, the left DataFrame, and the right DataFrame. So
Yb and ‘file’ are your 2 DataFrames that are being merged. Here are the other arguments:
how: This defines what kind of merge to make. It defaults to ‘inner’, but other possible options include ‘outer’, ‘left’, and ‘right’.
on: Use this to tell merge() which columns or indices (also called key columns or key indices) you want to join on. This is optional. If it isn’t specified, and left_index and right_index (covered below) are False, then columns from the two DataFrames that share names will be used as join keys. If you use on, then the column or index you specify must be present in both objects.
In this case,
how is set to
Using a left outer join will leave your new merged DataFrame with all rows from the left DataFrame, while discarding rows from the right DataFrame that don’t have a match in the key column of the left DataFrame.
on is set to
something, so it will merge specifically the
Hope this helped.