How to join specific columns in Pyspark

In pandas, specific column join in Pyspark is perform by this code:


I tried similar logic using Pyspark

datamonthly = datamonthly.join(datalabel
            , datamonthly['msisdn'] == datalabel['msisdn']
            , 'left' ).select(datamonthly['*'],'application_type','msisdn','periodloan'))

Here’s the error message

TypeError: Invalid argument, not a string or column: DataFrame[application_type: string, msisdn: string, periodloan: string] of type <class 'pyspark.sql.dataframe.DataFrame'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.


i think the problem is in the select portion of the code,here you go:

datamonthly = datamonthly.alias('datamonthly').join(datalabel
        , datamonthly['msisdn'] == datalabel['msisdn']
        , 'left' ).select('datamonthly.*' , datalabel.application_type, datalabel.msisdn, datalabel.periodloan)

note that you can select by doing: