Identifying genuine phone number

I have a data set having a dedicated column for capturing phone numbers. My task is to validate the same since there are wrong entries like “9999999999”,”0123456789″ and many others of similar nature. I thought of tackling the issue by identifying the carrier names, so easily above instances can be ignored as there won’t be any carrier names. I came across a package called phonenumbers, and used the below code

import phonenumbers
from phonenumbers import carrier
ro_number = phonenumbers.parse("+91xxxxxxxxxx") # number is redacted purposely
carrier.name_for_number(ro_number, "en")

Which gave the output as 'BSNL MOBILE' I wanted to run this on the entire column of the dataframe, where a new column is created and against each number carrier name is recorded.

I tried to use for loop,

for i in df['phone_number']:
    ro_number = phonenumbers.parse(i)
    carrier.name_for_number(ro_number, "en")

But got the below error

TypeError                                 Traceback (most recent call last)
<ipython-input-80-af01b9d8c9ef> in <module>
      1 for i in merged_Data['SELLER_NUMBER']:
----> 2     ro_number = phonenumbers.parse(i)
      3     carrier.name_for_number(ro_number, "en") in parse(number, region, keep_raw_input, numobj, _check_region)
   2834         raise NumberParseException(NumberParseException.NOT_A_NUMBER,
   2835                                    "The phone number supplied was None.")
-> 2836     elif len(number) > _MAX_INPUT_STRING_LENGTH:
   2837         raise NumberParseException(NumberParseException.TOO_LONG,
   2838                                    "The string supplied was too long to parse.")

TypeError: object of type 'int' has no len()

Not sure if that is the right way to go about to iterate over entire column. Help would be much appreciated.


Made two code mods:

  1. Use method is_valid_number to check if number is with an exchange
  2. Specify region (such as “US”) since using None did not work for test case “18004444444” which is an MCI phone test number.


import phonenumbers
from phonenumbers import carrier

def valid_number(number, region = "US"):
    ''' check validity of phone numbers (default to US region)
        Used default region as US since some numbers did not work using None
    # Parsing String to Phone number
    phone_number = phonenumbers.parse(number, region)
    # Validating a phone number (i.e. it's in an assigned exchange)
    return phonenumbers.is_valid_number(phone_number)

Test With List

data = ["+442083661177", "+123456789", "18004444444"]

for i in data:
    print(i, valid_number(i))

# Output
+442083661177 True
+123456789 False
18004444444 True    # note: this number doesn't work with default region = None

Test With DataFrame

df = pd.DataFrame({"phone_number": data})
df['valid'] = df['phone_number'].apply(valid_number)
# Resulting df
    phone_number    valid
0   +442083661177   True
1   +123456789  False
2   18004444444 True