Often a postal address is provided as a string. For example, when an address is entered by a user or originates from a spreadsheet file. The address string can be stored as it is and utilized as an additional information string. However, more frequently it's required to parse the address string into address components: house number, street, city, postcode, etc.
In this article, we would like to give an overview of ways and methods that can help to parse postal addresses. The methods are listed starting from basic to more advanced ones.
Parsing address with Regex
The easiest way to parse an address is by applying a Regex. This method really proves itself when you have regular form addresses. For example, if all the address strings are like STREET_NAME XX, YYYYYY CITY_NAME, you can select a regexp that will split the strings to [STREET_NAME, XX, YYYYYY, CITY_NAME]. Here are some Stackoverflow threads with examples of regular expressions for address lines:
- Parsing address with Regex
- Parsing Street Address Using RegEx
- RegEx for parsing addresses in a required format
You can use an online Regexr tool to build and test the regular expession.
Pros
- An effective way to parse structured strings
- Powerful and configurable
- No external library/service/API needed
Cons
- Difficult to design, debug and support
- Can create performance issues
- Do not work if addresses are not regular
NPM-packages to parse addresses
There are NPM packages that are intended to parse addresses effectively. However, most of them focus on particular country standards. This could be a very good option when you deal with addresses that belong to one or several countries. In the table below are examples of NPM packages to parse postal addresses.
Package | Description | Popularity | Dependencies | License |
---|---|---|---|---|
parse-address | Parsing US addresses | +++ | 1 | ISC |
addresser | Parsing property addresses | ++ | 0 | MIT |
addressit | Freeform street address parser | ++ | 10 | MIT |
humanparser | Parsing human and address information | ++ | 32 | MIT |
address-parse | Analysing and parsing Chinese addresses | + | 6 | MIT |
@minutrade/br-address-parser | Simple parser for Brazilian addresses | + | 0 | MIT |
parse-address-australia | Australia street address parser | + | 0 | ISC |
We recommend you try a few libraries to choose the one that works the best with the addresses you have.
Pros
- Easy and convinient
- Open source and comminity driven
- No external service/API needed
Cons
- Only limited countries address formats are covered
- Carefully check licenses and dependencies before using a library in a commercial project
Libpostal - international address parser trained with NLP and open data
The Libpostal is a lightweight C library created by Mapzen company and designed to parse international addresses. The main difference from other address parsers is that the Libpostal uses machine learning and it's trained with millions of real-world addresses from open data sources.
You can use the library directly or by using bindings. The bindings for Python, Go, Ruby, Java, and NodeJS are available.
The Libpostal is Open Source and distributed under MIT license.
Here are some examples of parsing addresses with Libpostal:
> Franz-Rennefeld-Weg 8, 40472 Düsseldorf
Result:
{
"road": "franz-rennefeld-weg",
"house_number": "8",
"postcode": "40472",
"city": "düsseldorf"
}
> P.za Giuseppe Garibaldi, 80142 Napoli NA, Italy
Result:
{
"road": "p.za giuseppe garibaldi",
"postcode": "80142",
"city": "napoli",
"state_district": "na",
"country": "italy"
}
> 19 East Rd, Sheffield S2 3BH, UK
Result:
{
"house_number": "19"
"road": "east rd",
"city": "sheffield",
"postcode": "s2 3bh",
"country": "uk"
}
Pros
- smart and effective
- parses location-based strings. Can understand categories and phrases like "restaurants", "near", "in", etc.
- open-source product with a permissive license
Cons
- the library should be installed and supported
- requires and keeps trained data model in memory, therefore consumes about 4GB of memory
Geocoding API - to parse, standardize, validate and verify addresses
The Geocoding API is the most powerful tool to parse addresses. Together with the address structure, you get useful location information and latitude/longitude coordinates.
However, we need to understand that the purpose of Geocoding API is not to parse the address but provide the most suitable existing location for the address. So for example, if you search an address that doesn't exist in the Geocoder database it will return the nearest results or empty results if no akin address is found.
Let's have a look at how to parse, standardize, validate and verify addresses with Geoapify Geocoding API.
The Geoapify Geocoding API returns the following information as a result:
- corresponding address: latitude/longitude coordinates, standardized address, address components
- confidence level for the found address, street, and city
- original address test and parsed address
Here is an example of Geoapify Geocoding API result:
{
"type":"FeatureCollection",
"features":[
{
"type":"Feature",
"properties":{
"datasource":{
"sourcename":"openstreetmap",
"attribution":"© OpenStreetMap contributors",
"license":"Open Database Licence",
"url":"https://www.openstreetmap.org/copyright"
},
"housenumber":"70",
"street":"Rue des Francs",
"district":"Marcinelle",
"city":"Charleroi",
"county":"Hainaut",
"state":"Wallonia",
"country":"Belgium",
"country_code":"be",
"lon":4.433351295038911,
"lat":50.40213325,
"formatted":"Rue des Francs 70, Charleroi, Belgium",
"address_line1":"Rue des Francs 70",
"address_line2":"Charleroi, Belgium",
"result_type":"building",
"rank":{
"importance":0.721,
"popularity":7.051335815095913,
"confidence":1,
"confidence_city_level":1,
"confidence_street_level":1,
"confidence_building_level":1,
"match_type":"full_match"
},
"place_id":"51487c1f71c0bb1140592db1321a79334940f00102f901e388422300000000c00203"
},
"geometry":{
"type":"Point",
"coordinates":[
4.433351295038911,
50.40213325
]
},
"bbox":[
4.4332593,
50.402073,
4.4334427,
50.4021931
]
}
],
"query":{
"text":"Rue des Francs 70, 6001 Charleroi, Belgium",
"parsed":{
"housenumber":"70",
"street":"rue des francs",
"postcode":"6001",
"city":"charleroi",
"country":"belgium",
"expected_type":"building"
}
}
}
- When
rank.confidence
= 1, it's safe to say that the address is valid and verified. Thehousenumber
,street
,city
, etc. represent the parsed and normalized address. - When
rank.confidence_city_level
< 1 orrank.confidence_street_level
< 1, the address can't 100% be verified. However, the most similar address is returned as a result. - When
rank.confidence_building_level
= 0 andrank.confidence_street_level
= 1, the address is valid and verified up to street level, but the building was not found. - When
rank.confidence_building_level
> 0 andrank.confidence_street_level
= 1, the address is valid up to street level, but there are doubts about the exact building position.
Pros
- crossplatform and flexible
- normalizes the address
- validates and verifies the address
- returns address location
Cons
- usually is not free when parsing a large number of addresses
- requires additional logic to format not verified addresses
Which method is the best and what to choose?
We recommend the following algorithm when you need to make a decision of which method or way to choose by working with address strings:
- If you need to get addresses locations or normalize addresses then use Geocoding API
- If all the addresses have the same structure and are regular then use RegEx
- Otherwise try to find the most suitable npm-library or use Libpostal to parse addresses
FAQ
How can I parse postal addresses?
Use regex if the addresses are regular, formatted the same way. Use an NPM-library or Libpostal to parse more complicated addresses. Use Geocoding API to parse addresses and find the corresponding location.
Which NPM libraries can be used to parse addresses?
There are many npm libraries to parse addresses. Search by street + address + parse tags. Try several and choose the one that fits best your requirements. Always check licenses before adding a new package to a commercial project.
How can I use Geocoding API to parse addresses?
The Geocoding API looks for a corresponding location for the provided address. Accordingly, it makes more than just parsing the address. It tries to understand the address, find the country, state, city, street it belongs to, and, in the end, the house number. So you get the standardized and verified address as the result.
What is address normalization?
Address Normalization or Address Standardisation is a process of formating free-form or structured addresses according to country mail address standards. The process includes replacing abbreviations with conventional names. For example, "Ansbacher Str. 7, 91541 Rothenburg o.d.T." will be normalized as "Ansbacher Straße 7, 91541 Rothenburg ob der Tauber, Germany"