๐Ÿ’ฐ IRS SOI

IRS Tax Return Data by ZIP Code

Actual tax return data โ€” income distribution, capital gains, EITC claims, business income, and average tax rates โ€” for every US ZIP code. Not a survey. Real tax filings. Part of the EnrichZip dataset โ€” all 33,000+ US ZIP codes, instant download.

33k+ZIP codes
21Columns from IRS SOI
138+Total columns
InstantDelivery

What you can do with this data

Wealth Managers & Financial Advisors

Prospect in ZIP codes with high concentrations of capital gains filers and high-AGI returns. IRS data shows you where wealth is actually located โ€” not where people say they live, but where they file taxes.

Luxury & Premium Brands

Target marketing and retail site selection using actual income distributions. The percentage of returns over $200K is a more reliable affluence signal than median household income from survey estimates.

Political & Advocacy Organizations

Map EITC prevalence and income brackets to understand economic vulnerability by geography. IRS SOI is the gold standard for income distribution data because it's based on actual tax filings, not survey responses.

AI Financial Applications

Feed structured, verified income and wealth data into AI models for credit risk scoring, market sizing, financial product targeting, or any application requiring reliable geographic income signals.

IRS SOI columns in the dataset

Source: Internal Revenue Service, Statistics of Income Division. All columns are pre-joined and ready to use โ€” no API key or GIS software required.

Column NameDescription
irs_total_returns Total tax returns filed
irs_total_agi_000 Total adjusted gross income ($000s)
irs_avg_agi Average AGI per return
irs_avg_wages Average wages per return with wage income
irs_returns_under_25k Returns with AGI under $25,000
irs_returns_25k_50k Returns with AGI $25,000โ€“$50,000
irs_returns_50k_75k Returns with AGI $50,000โ€“$75,000
irs_returns_75k_100k Returns with AGI $75,000โ€“$100,000
irs_returns_100k_200k Returns with AGI $100,000โ€“$200,000
irs_returns_over_200k Returns with AGI over $200,000
irs_pct_over_200k % of returns with AGI over $200,000
irs_returns_with_capital_gains Returns reporting net capital gains
irs_pct_capital_gains % of returns with capital gains
irs_returns_with_eitc Returns claiming Earned Income Tax Credit
irs_pct_eitc % of returns claiming EITC
irs_returns_with_biz_income Returns with business or self-employment income
irs_pct_biz_income % of returns with business income
irs_returns_with_rental Returns with rental or royalty income
irs_pct_rental_income % of returns with rental income
irs_total_tax_000 Total income tax liability ($000s)
irs_avg_tax_rate Average effective tax rate

About IRS SOI data

Income data by ZIP code is widely available โ€” but most of it comes from Census surveys, where people self-report their earnings. The IRS Statistics of Income (SOI) division publishes something fundamentally different: actual aggregated tax return data for every ZIP code in the United States. When someone files a return reporting $500,000 in AGI and $120,000 in capital gains, that shows up in the IRS SOI data. No self-reporting bias, no survey underrepresentation of high earners.

This distinction matters enormously for wealth analysis. Survey-based income measures like Census median household income systematically underestimate income in high-wealth ZIP codes, because affluent households are harder to survey and more likely to underreport. IRS data captures actual reported income, making it far more reliable for identifying affluent markets.

The capital gains percentage is the most powerful wealth signal in the IRS dataset. Capital gains income โ€” from stock sales, real estate transactions, and investment portfolio activity โ€” is highly concentrated among high-net-worth individuals. In the wealthiest ZIP codes, 30-40% of returns report capital gains. In working-class ZIP codes, that figure is typically under 5%. This single column can do more work for luxury brand targeting or wealth manager prospecting than a dozen demographic variables from the Census.

The EITC data tells the other side of the story. The Earned Income Tax Credit is a federal credit for low-to-moderate-income workers. ZIP codes where 25%+ of returns claim the EITC are predominantly lower-income markets โ€” a useful filter for social services organizations, affordable housing developers, and mission-driven businesses targeting economic need.

For AI applications, IRS SOI data is particularly valuable because it's clean, verified, and structured. Feeding a ZIP-level income distribution profile to an AI model โ€” with actual tax bracket distributions rather than a single median estimate โ€” dramatically improves the model's ability to reason about purchasing power, financial product eligibility, and market segmentation.

Questions about IRS SOI data

Why is IRS data better than Census income estimates?

Census ACS income data is based on surveys โ€” people self-reporting what they earned. IRS SOI data is based on actual tax returns filed with the federal government. It's more accurate for high-income ZIP codes (wealthy people tend to underreport income in surveys), more granular in capturing investment income, and includes metrics like capital gains rates and EITC prevalence that Census simply doesn't publish. For any income or wealth analysis, IRS SOI is the gold standard.

What year is the IRS data?

Our dataset uses IRS SOI ZIP code data for tax year 2021, the most recent year available with complete ZIP-level publication. The IRS publishes this data approximately 2-3 years after the tax year. We update when new SOI data is released.

What does the capital gains percentage tell me?

The percentage of returns with capital gains is one of the strongest wealth indicators available at the ZIP level. Capital gains income (from stocks, real estate sales, and other investments) is highly concentrated among high-net-worth individuals. A ZIP where 20%+ of returns report capital gains is almost certainly an affluent market. Combined with the $200K+ return percentage, it gives a two-factor wealth screen that's far more reliable than survey-based median income.

Is this data privacy-compliant?

Yes. The IRS publishes SOI data only as ZIP-level aggregates, never as individual return data. The IRS suppresses data for ZIP codes with fewer than 10 returns to protect taxpayer privacy. EnrichZip uses only this published aggregate data โ€” we have no access to individual tax returns.

AI-ready data. Instant download.

IRS SOI data combined with 11 other government sources โ€” 153 columns, 33,000+ ZIP codes. Clean, documented, and ready to use with AI tools or drop straight into Excel.

See Pricing โ†’